Nine Steps to Design an E-Discovery Protocol
Gregory L Fordham
August 2009
When confronted with their first e-discovery project, many wonder where to begin and how to proceed. The following are nine steps of an e-discovery protocol.
1. Identify People, Places & Events
If your target is General Motors it is not practical to request all of their computer records. So the first step is to determine significant people, places and events that are important to the case so that the request can be appropriately targeted.
2. Preserve the Data
The second step is to preserve the data for subsequent analysis. The focus of the preservation effort should be the media, such as hard drives and backup tapes, on which the believed important data resides.
It is important that the media be preserved and not just specific files or documents. It is essential to capture the entire media with the full spectrum of data. That way no matter how the case dynamics might change, the data will be available for analysis
There is a difference between preservation and analysis. Consequently, preservation does not have to be expensive.
3. Prepare the Data for Analysis
Since the data is electronic the best way to analyze it is with a computer. As a result, the next step is to put the data in an appropriate form for subsequent analysis.
Preparing the data for analysis can involve several steps. It can require imaging the original media so that it is in both a protected and analyzable form.
It can also involve running various computerized processes that make the most of the data. These processes can include computing digital fingerprints (MD5 Hashes), confirming file signatures, recovering deleted files and folders using both the file system and data carving techniques, and indexing the data in preparation of word searching.
4. Producing File Lists
The actual production can be an iterative, “peal the onion” kind of approach. The first step can be the production of lists that disclose the names of files contained in the filing system as well as e-mails contained in e-mail repositories.
Although lists are sometimes not informative enough, they, nonetheless, are a good first step that allows the requesting party to see what is available and how their requests can be better targeted.
In addition these lists provide useful attribute information that confirm or deny attempted spoliation as well as provide important usage and trend information about the media and the data it contains.
5. Streamline the Population
E-discovery can be like being buried alive in the treasure room of the Pharaoh. In order to keep costs low for both the technical experts and the legal staff, it is important to stre-amline the population.
Streamlining the population can involve many steps such as removing duplicates and removing known files like operating system and application program files, from consideration.
Recovered deleted files require special additional consideration. Computer processes can stratify recovered deleted files into two populations—those likely to be recovered intact and those likely to contain only fragments.
6. Search the Uniques
Now that the population is properly prepared the sixth step is to start sifting through the data. Searches can be term searches, context searches or even network searches. Each has its advantages and disadvantages.
Like other parts of this process, searching can be an iterative process. As results are learned, refinements can be made to more accurately identify the desired target.
7. Exclude Privilege Items
After identifying the desired targets, excluding privilege items is the next step. The same search techniques can be used but their populations narrowed to the responsive targets.
8. Validate the Population
Before assuming the search is complete, it is wise to validate the population. This can be done by examining metadata contained in various artifacts and system resources. The objective is to confirm that all of the relevant media has been preserved and subjected to search.
9. Produce the Data
Of course the final step is to produce the data. It is best to produce the data in native format on read-only media, although if the volume is large external hard drives may be the most practical solution. Along with the data, the MD5 hash of each file should be memorialized.