greg fordhamSubscribe

 

 

 

The Consequence of Search Technology on
Rule 34 Requests for Production in E-Discovery

Gregory L Fordham
June 2010

The form of production under Rule 34, FRCP is a fertile area for argument.  The seasoned litigator is likely to know that 34(b)(1)(C) allows the requester to specify the form in which the ESI is to be produced. 

If the request does not specify a form then 34(b)(2)(E)(ii) requires that it must be produced in a form or forms in which it is ordinarily maintained or a reasonably usable form.

The reasonably usable form requirement has been held to mean searchable.  As a result, image based productions alone like TIFF or PDF are not adequate.  So, if image based formats are produced they must be accompanied with searchable text.

The searchable text requirement is where things can get tricky, however.  More specifically, how was the searchable text obtained and what is its accuracy rate?

If it was obtained by OCRing [Optical Character Recognition] the image format document pages then how usable is it?  As good as OCR technology has become, it is still not infallible.  In fact, in its raw state it is likely to only have an 80 percent accuracy rate.  The problems can range from misinterpreted characters to completely missed characters.

While various OCR tools claim to have closer to 99 percent accuracy rates, these levels are typically achieved through a combination of spellcheckers and human interaction. The question, therefore, is at what level of accuracy is searchable text usable under Rule 34 and what duty does the producing party have to achieve that accuracy?

Since more sophisticated buyers of litigation support services impose accuracy standards on their service providers for searchable text that are as high as 99 percent accuracy, would it be unreasonable for Rule 34 to impose such a requirement? Or, would the failure of Rule 34 to impose such a requirement undermine the very goals that the rule hopes to achieve and the legitimacy of the rule in the first place?

Certainly, responders to production requests might argue that a 99 percent accuracy rate is overly burdensome.  Requesters on the other hand might counter that such accuracy rates are essential for two reasons.

First, without higher accuracy rates producers would miss many responsive and relevant documents if they use the same OCR’d text when responding to production requests.

Second, without higher accuracy rates both producers and requesters would have to increase the extent to which fuzzy logic is used in their keyword search terms.  The increase in fuzzy logic will, in turn, increase the number of false positives and the cost of performing document review.

Thus, with document review on a gigabyte basis costing 20 to 40 times more than processing, is it unduly burdensome for both requesters and producers to allow anything less than near perfection?

Interestingly, the Advisory Committee Notes from the 2006 amendments explain that while native format is not the required production format, the option to produce in a reasonably usable form does not mean that the responding party is free to convert its production into another format that makes it more difficult for the requesting party to use the information. 

Whether or not the format is more difficult to use is generally a question of whether it degrades the requesting party’s ability to search the document.  Certainly, an 80 percent accuracy rate is significantly degraded when compared to 99 percent.

If there is a lesson to be learned, it might be that image format files were often produced to deprive requesters of the more useful native format files.  The production of OCR’d text does not necessarily improve the situation particularly if there is no accuracy standard.

Naturally, the cure might be to supply the searchable text by extracting text from the native document.  Extracting text is the equivalent of cut and paste as compared to recognizing characters.

But if one goes so far as to extract the text then why not produce the native file at the start?  After all, Rule 34 requires that documents be produced in the form they are normally maintained. 

Even so, native format can have its limitations as well.  After all, native documents can contain non-searchable data such as imbedded images. Even PDF documents can make extracting text difficult.

If native format documents do contain non-searchable data, is there any reason to require producers to provide searchable text as well?  The native format file is what producers actually have.  So, is there any basis to require more?

While there is probably no silver bullet to solve this problem, litigators should realize that there is reason to specify the accuracy rates of searchable text when native format documents are not provided. And even when image formats with searchable text are desired, requesters should test the accuracy of the produced data, at least on a sample basis, otherwise they could be undermining the success of their case as a result of the data provided.

In recognition that a producer's duty may not match their own plans for data usage, the parameters for searchable text processing is one area where parties may want to cooperate during discovery in order to minimize litigation lifecycle costs. In other words, recognizing that electronic documents are likely to have some non-searchable text elements, parties will likely want to develop searchable text from these documents with high accuracy rates. In order to minimize lifecycle costs, why not agree to split the costs of processing accurate searchable text on the front end while there are economies of scale and prior to the expenditure of more expensive review costs?