How to Get Rich with Electronic Data Discovery
by
Gregory L. Fordham, CPA, CIA
Since January 2002 there have been two decisions involving the discovery of e-mail data that also involved incredible cost estimates for the retrieval, production and review of that data. One decision was Rowe Entertainment v. The William Morris Agency, 205 F.R.D. 421 (S.D.N.Y. 2002) and the other decision was Murphy Oil USA, Inc. v. Fluor Daniel, Inc., 2002 WL 246439 (E.D. La 2002).
Most litigators are already aware of the eight criteria formulated in Rowe for allocating the costs of discovering electronic data. Similarly, litigators may also already be aware that the decision in Murphy Oil followed those eight criteria when allocating the costs of retrieval, production and review to Murphy. What litigators may not have gleaned or been able to realize from those decisions is that the inefficiency of the techniques employed is what propelled the cost estimates to more than $10 million for Rowe and $6.2 million for Murphy. In the sections that follow I will explain how you, too, can get rich with electronic data discovery.
Setting the Mark
Rowe Entertainment, Inc. was a black concert promoter who claimed that they had been prohibited from promoting events with white bands by the discriminatory and anti-competitive practices of more than 30 defendants. Rowe formulated a sweeping demand for documents, including e-mails. All of the defendants moved for a protective order relieving them of their obligation of producing the e-mail data.
One of Rowe’s defendants claimed to have more than 200,000 e-mails. As for the others, the decision only reveals that there were more than 1,000 back-up tapes that potentially contained e-mail data. So, while the actual number of e-mails and their attachments is not clear from the decision it is clear the number is potentially very large. Also, it is clear that the tapes contained a large number of duplicate e-mails and attachments, since the back-ups were often performed on a daily basis.
In support of their arguments, Rowe’s defendants obtained experts whose estimates for producing the requested data totaled more than $1.6 million if sampling methods were utilized and more than $10 million if the entire population of responsive documents were produced. In rebuttal Rowe retained its own expert who claimed that
the cost of responding to its discovery request should be between $136,000 and $236,000 under the sampling method.
In the case of Murphy Oil, Murphy contracted with Fluor for the performance of a “turnaround” at its refinery in Meraux, Louisiana. Murphy claimed that Fluor breached its contract and sought e-mail data as part of its discovery request.
Like the defendants in Rowe, Fluor sought relief on various grounds. Unlike Rowe, the decision in Murphy clearly identifies an e-mail population of about 2.3 million communications that contained about 19.7 million document pages when attachments were considered. Using a similar methodology as the defendants in Rowe, Fluor estimated the cost of retrieving, producing and reviewing its e-mail files at $6.2 million.
Lifting the Wallet
The methodologies employed by the various defendants in each of these cases was somewhat similar. Although many proposed databasing the e-mail messages and searching them electronically, they planned to convert the attachments to TIFF [Tagged Image File Format]. In some cases, as with Fluor the defendants planned to convert even the e-mail messages themselves into TIF format.
Since TIFF is a common format when imaging paper documents, one can only deduce that the defendants were using the traditional document management techniques. Furthermore, they were not using electronic conversion processes but rather planned to first print the attachments, and in some cases the e-mail messages themselves, onto a paper medium. The paper medium would then be imaged and reconverted to another electronic medium, TIFF. The traditional document management application would then be used to review and redact the document images and produce the responsive copies.
The enormous expense and inefficiency of going from one digital format to paper and back to another digital
format should be obvious to anyone. Nonetheless, the defendants proved ignorant of computer forensics and electronic file management techniques and, instead, formulated their methodologies around the more \
traditional document management technologies. The defendant’s in Rowe, William Morris Agency in particular, considered this process essential in order to Bates stamp and redact the documents. Apparently, they were unaware that this could be accomplished electronically. In any event, this was but one of many inefficiencies introduced by the TIFF conversion process.
Another inefficiency is that the TIF format is not electronically searchable. By contrast, there are numerous software packages for less than $500 that allow users to electronically search any number of common electronic file formats using complex word search criteria along with advanced capabilities like stemming, phonic and fuzzy logic. Furthermore, these packages can be used against entire disk drives so that large collections of data can be electronically searched for all files meeting the search criteria. Those meeting the criteria could be reviewed while those failing the criteria could be ignored. Using such software packages would have been far more efficient and effective against the e-mail attachments than the defendant’s planned TIFF conversion approach. If one needs to look beyond the file format and into the file header areas or the unseen areas of a file or disk there are still other software programs that also offer similar search capability for less than $500.
By themselves these various electronic search programs would have been far more efficient than the defendant’s TIFF conversion process, although probably a little cumbersome. Such problems could have been easily overcome, however, with some help from an experienced forensic data manager and software engineer. The defendants would likely argue that the TIFF images can be optically recognized and their textual representations electronically searched, but that is like cutting-off your legs so that you can run a race in cheap prosthetics. Anyone familiar with that process knows that while the accuracy rate is very high it is still less than 100 percent.
As a result, manual processes are often required to compensate for this lack of accuracy. With the electronic document, however, the only error that exists is what is exists in the document. Consequently, the original data
file format offers the best opportunity for leveraging the computer’s power on jobs like those in Rowe and Murphy Oil.
Extracting the e-mails and their attachments so that they can be electronically searched is also readily accomplished. Component Object Model (COM), Message Application Program Interface (MAPI), and Collaborative Data Objects (CDO) are only a few of the many methods for electronically manipulating e-mail databases and their contents into the formats accepted by the more sophisticated analysis programs described previously.
Another factor contributing to the defendant’s higher cost estimates is that the conversion to the TIF format prohibits the electronic recognition of either the duplicate email messages or their attachments. Any collection of electronic data has a unique fingerprint that can be represented by a hash value produced from a 128 bit algorithm known as the MD5 hash. When electronic files have the same MD5 hash value they are identical. The MD5 hashing process can be applied electronically to both the e-mail attachment files, as well as, each of the e-mail messages themselves.
The significance of the MD5 hashing process for the litigator is that when retained in their original electronic formats, computerized processes can be used to sift through the hundreds of thousands of e-mails and attachments and eliminate from the review process any duplicates based on their MD5 hash values. The TIFF conversion prohibits this process because the scans of the paper documents and their associated TIFF representations will never be exactly the same at the bit level. In multiple scans of the same document differences can occur simply because of a differences in document orientation caused by slightly different placements of the document on the scanner bed either manually or through the document feeder. Furthermore, it is highly unlikely that the scanner would produce the same, bit-for-bit, TIFF representation of the document even if the document
was never moved from the scanner during repeated scans.
Interestingly, each of the TIFF inefficiencies described above was recognized by Rowe’s consultant, who also recognized a third problem. The third problem with converting to the TIF format was that data was lost. Both the e-mail and the attached files contain far more information than ever appears to the user and would ever appear in the TIFF image. Perhaps in a different kind of case this unseen data could have been highly useful. For example, the e-mail header contains important routing information if the true identity of the e-mail originator is ever significant to the case. Similarly, there could be formulas in spreadsheets and redlined information in word processed documents. Even the e-mail database itself could retain deleted e-mails that are no longer visible in the trash folder but still present in the e-mail’s database system.
So, the amount of data that can be lost as a result of the TIFF conversion can be quite significant; not only in substance but quantity as well. This fact sets the stage for an interesting comparison. Recall that both Rowe and its defendant’s agreed to a method of sampling the desired e-mail data. Rowe’s expert estimated the cost for the review at no more than $236,000 while Rowe’s defendant’s estimated $1.6 million for the same sample. Thus the inefficiency introduced by the TIF format conversion was huge. Once one realizes that the volume of analysis that Rowe’s consultant could perform with the original electronic files was far more than with the TIF format the disparity between the two techniques is even larger.
Repeating the Play
One thing that has really stimulated discussion of these cases among litigators is the magnitude of the cost estimates. The previous sections explained why there was a difference between the plaintiff’s estimate and the defendant’s estimate for the same data sample in Rowe. What was not addressed is the impact of expanding the production from the sample to the entire population.
In Rowe, the estimated cost of the defendant’s TIFF
conversion plan grows from $1.6 million for the sample to over $10 million for the entire population. Should one expect the estimate of Rowe’s consultant to increase by the same multiplier? The answer is probably not.
As incredible as it appears the dramatic increase under the TIFF conversion is quite understandable. Under the TIFF conversion plan every document must be printed, converted and manually evaluated. The more documents the more cost, since the marginal cost of the increased data volume is constant.
By contrast the marginal cost of the data analysis using electronic techniques has two components; electronic and manual. While the electronic analysis methods can involve a certain setup costs once those processes are unleashed they continue at marginal rates that approach insignificance. The effect of these processes is to winnow the wheat from the chaff so that the more expensive labor aspects of the analysis are applied to only those elements likely to produce a significant finding. As a result, much of the expensive manual processes employed under the TIFF conversion method is replaced with analysis methods having an almost insignificant marginal cost.
Consider this analogy. Both plaintiffs and defendants cast a wide net. Only the plaintiffs in Rowe were able to control the opening of their mesh. This allowed the plaintiffs to more precisely target their catch. The defendants, by contrast, labored with a fine mesh that not only made it harder to harvest but the big net was slower to retrieve and it was filled with great quantities of trash. All of this only increased the chance that the big fish would go unrecognized.
The significance of this analogy is that it is unlikely that the plaintiffs’ estimate would have increased significantly beyond its upper cost estimate even if the entire population had been analyzed. The samples had been taken uniformly through the period covered by the back-up tapes. Further analysis would have likely revealed numerous duplicates and fewer new items of interest. Clearly, the TIFF conversion plan would not have been so efficient.
Conclusion
In summary, techies have long recognized that their creations are a double edged sword. They can either solve problems more economically or create larger problems than ever imagined. Both Rowe and Murphy Oil are examples of the latter, since in both cases the misapplication of technology could cause litigants to literally waste millions of dollars.
In the case of Murphy Oil, all of the inefficient cost were allocated to Murphy, since they proposed no other alternative. With respect to Rowe, however, the court was more flexible. First, it allowed Rowe to formulate a search procedure, Secondly, it allocated the cost of any procedures beyond those adopted by Rowe, such as the creation of TIFF files, to the defendants, All other costs were allocated to Rowe except for the costs of any review for privilege or confidential material. Those costs were allocated to the defendants in both Rowe Entertainment and Murphy Oil. So, at least in the case of Rowe, It was the defendant who paid for its inefficiency and not the plaintiff.
Although, the cases of Rowe Entertainment and Murphy Oil will likely be remembered most for the formulation of the eight criteria for allocating the costs of electronic data discover, their greatest lesson is the cost of error in the computer age. As I have said many times before, “The proverbial genie is out of the bottle. The litigator who is not versed in electronic data discovery and the art of cyber litigation is at risk from his enemies, as well as, the fulfillment of his own wishes.” On second thought, a little ignorance could make you rich.

Printable Version
|