|
Electronic Data Discovey Unleashed
By: Gregory Fordham
Electronic data discovery, also known as computer forensics, is the application of numerous science and engineering disciplines to the legal problem of digital evidence. As our society becomes increasingly dependent on digital technology the use of digital evidence in the court room has become more common. In fact, it has become so common that the question of whether to use electronic data discovery has been elevated from one of best practice to potentially malpractice, according to Julie K. Hannaford, Co-Chair of the American Bar Association’s , Computer and Internet Litigation Committee.
Remarkably electronic evidence means different things to different people. For some, electronic evidence is the data recovered from a computer hard drive. To others it is an incriminating admission in an e-mail. In yet other cases, it is an obscured trail of seemingly unrelated transactions buried in voluminous business records that was uncovered by an ordinary computer tirelessly performing thousands of comparisons per second. For still others, it is simply a way to efficiently manage a large, complex case.
Without question electronic data discovery has many faces and many roles. Similarly, it is rich in both promise and peril. What follows is an examination of all of these.
The journey begins with a brief review of the nearly forty year history of electronic data discovery followed by a discussion of various digital technologies and their significance. Next is an examination of the unique procedures required for electronic data discovery and a discussion about the requisite tools. Finally, myths are debunked and realities revealed through a collection of frequently asked questions.
A New Idea with a Past
In recent years there has been increased interest in the use of electronic data in litigation. Remarkably, the use of electronic data in litigation has nearly a forty year history. Furthermore, that history is actually very friendly toward the use of electronic data.
The earliest use of electronic data involved the acceptance of computer printouts. Since the late sixties courts have considered computer printouts as evidence at trial. Furthermore, computer records qualify as business records under the Federal Business Records Act and as originals under the Federal Rules of Evidence. Most states have similar provisions under the Uniform Business Records Act, the Uniform Photographic Copies of Business and Public Records as Evidence Act and the Uniform Rules of Evidence or other specially crafted statutes.
Perhaps even more important than the acceptance of computer records as a business record is that the electronic data, itself, is discoverable. The 1970 amendments to Rule 34 of the Federal Rules of Civil Procedure made it clear that electronic compilations are subject to request for production. So, it is not even necessary that electronic data be converted to a paper medium as part of the discovery process. The electronic data can be taken in its native form and there is no particular form in which the records must be provided.
Electronic data has many advantages over the traditional paper medium. This fact, too, has been well recognized by the courts. As a result, they have even required the production of electronic data when print-outs were available and sanctioned respondents who submitted paper documents instead of the requested electronic data.
Computer evidence is subject to the same foundation requirements as its paper counterpart. Essentially, this foundation must show that the information placed into the computer is reliable and trustworthy. It matters not whether the data is delivered on a paper medium or in its native format. In fact, the mere fact that the paper documents were, themselves, generated from a computer exposes them to the same rigorous foundation requirements as the underlying electronic data.
Interestingly the foundation requirements for electronic data can extend to the software subsequently used to analyze the data. After all, it is well established that proof of properly functioning equipment is part of the foundation requirements for computer evidence. Furthermore, customized software will be subjected to higher burdens than standardized or commercially available software. This fact can prove troublesome to law enforcement using software whose sales are restricted to law enforcement entities and not available to the general public.
Despite the requirement for reliability and trustworthiness, computerized data is still admissible even when inaccuracies are discovered. Also, the fact that printouts of the data were not prepared contemporaneously does not restrict the admissibility of the data.
In more recent times the biggest debate has been centered on who pays for the cost of production. Historically, the rule has been that each party pays the costs of its own production; however, a dramatic shift in that rule recently occurred in the case of Rowe Entertainment v The William Morris Agency.
In Rowe, eight criteria were established for determining the responsible party for the costs of production. Those criteria were subsequently used again in the case of Murphy Oil USA, Inc. v. Fluor Daniel, Inc. Essentially, the criteria established in Rowe are an amalgamation of factors considered in other cases where costs were shifted from the producing party to the requesting party. So there are no surprises in the eight criteria other than their formal recognition as test criteria.
Under Rowe’s eight criteria the winner, or the loser as the case may be, pays for all the costs of production. In other words, the costs are not allocated between the parties based on the scoring of the eight criteria. Rather they are shifted from the producer to the requester.
The costs of reviewing responsive documents for privilege are not effected by Rowe’s eight criteria. The costs of a performing a privilege review are still born by the producing party.
What will be interesting to watch is how the use of the eight criteria plays out over time. For example, will there be some magical dollar threshold above which the eight criteria are employed? Based on the analysis conducted by the Rowe court the application of the criteria will likely apply to every case of every magnitude.
Data Acquisition
When deciding to pursue electronic data discovery litigators should realize that there is a vast array of both data sources and formats. Choosing a data source and its format can have significant ramifications on the discovery process. By properly matching a data source and format to a particular discovery problem can yield a treasure trove of both information and efficiency. By matching the wrong data source and format or not even recognizing the existence of a data source for a particular discovery problem can be both wasteful and fruitless.
The following sections examine the diverse universe of electronic data sources that includes disk analysis, databases, e-mails, web pages, faxes and other graphical images, tape back-ups, and finally the more standard electronic documents like spreadsheets and text documents.
Analyzing Disk Drives
If a computer disk is part of the electronic data production there is quite a lot that can be done with it. Typically, the most surprising thing is that deleted files can be recovered. Furthermore, even damaged and partially overwritten deleted files can be recovered. But that is just the tip of the iceberg. Usage patterns and other file manipulations are also there awaiting discovery.
To understand how all of this is possible it is necessary to understand drive structures, file management methods and analysis techniques. Each is discussed in the sections that follow.
Tracks, Sectors and Clusters
Physically, computer disks are comprised of concentric circles known as tracks. The difference between a floppy disk and a hard disk is that hard disks are a collection of disks referred to as platters. Each platter has two surfaces. The collection of tracks across several platters is called a cylinder. The concentric circles of a disk are then divided into pie shaped sections known as sectors and each sector is typically 512 bytes.
Sectors are then organized into collections. Those collections are known as blocks or in Microsoft parlance clusters. The number of sectors that comprise a cluster is a function of the operating system being used and the size of the hard drive. If the drive is large but the operating system can only reference a few locations on the drive then each cluster must contain a large number of sectors. On the other hand, if the operating system can reference a lot of locations on the drive then the number of sectors in a cluster can be small.

The computer user is totally unaware that the disk is organized into sectors and clusters. Instead the user thinks only in terms of files. From a technical standpoint a file is an abstraction mechanism so that the user does not have to know how the data is physically stored.
Since the operating system is actually referencing clusters, when a file is saved it is saved to cluster(s). If the file is larger than one cluster then it is saved to as many clusters as necessary to hold the data. If the file is smaller than one cluster then the entire cluster is still reserved for that file, since a cluster can be associated with only one file. So, if the cluster is larger than the file or if the last remnant of a file is smaller than the last cluster used to store the file, the remaining space in the cluster is left unused. The entire process can be analogized to seating in a restaurant. If a party of three is seated at a table for four the fourth chair remains unused.
The unused remnant has a technical name, slack space. Furthermore, there are some interesting side effects with slack space. More specifically, if the cluster being used to save the current file was used previously to save another file and the previous file occupied a larger portion of the cluster than is being occupied by the current file then the slack area will still contain the data from the previous file.
File Allocation Table
In the old days files were managed similar to their tape based counterparts. The name of the file was captured in a directory along with its starting location and length. Files were then saved to the starting location and stored in contiguous blocks. As disks grew larger and computer systems more sophisticated, however, a new methodology was required. As a result, the linked list file structure was developed.
In the linked list method every cluster is mapped in a data table called the File Allocation Table (FAT). Every entry in the FAT represents a cluster on the drive and each entry tells something about the condition of the data in that cluster. For example it identifies whether the cluster contains data and whether that data is simply one cluster in the chain of clusters comprising the file or whether it is the last cluster in the chain. The following table identifies the meaning of the various FAT codes used by Microsoft in each of their FAT implementations.

Here is how it works. The directory contains various information about the file including the starting cluster number. To find all the clusters containing the file the operating system looks in the FAT table at the file’s starting cluster number. In the example shown in the following figure, the operating system finds the files starting cluster number in the directory, cluster 9. It then goes to the FAT entry for cluster 9 and finds the number of cluster 16. It then goes to the FAT entry for cluster 16 and finds the number for cluster 1. It continues in this fashion until it discovers an entry in the FAT table containing the end of file marker at which time it has identified all the clusters containing that file; 9, 16,1,10 and 25. With that information the operating system can go to the disk and retrieve the data in those clusters.
Click for larger view  |
The linked list provides several advantages over the earlier file management methods. First, files do not have to be saved in contiguous clusters, although they frequently are. As a result, disk space utilization can be much higher. Second, the linked list can be used to track both used and free space clusters. Some of the earlier methods, even with linked lists designs, relied on bit mapped memory lists. As disk drives grew in size this method had to be abandoned, however.
|
Next >

Printable Version
|