How Sampling Can Reduce Your eDiscovery Burden

By Michael
February 22, 2011

Every administrator knows the difficulty in complying with discovery requests for electronically stored information (ESI) in a legal case. Even if an IT team isn’t involved in producing ESI, it may be called on by its company’s lawyers to assist them in analyzing mountains of data they’ve requested from someone who they’ve hauled into court. One way to lighten the load for all parties involved in the eDiscovery process in a case is by using sampling.

Search4data Because sampling by its nature means a portion of data universe is going to be analyzed, the concept can unnerve attorneys requesting information. There can be an element of doubt in their minds that a crucial piece of information will be missed in the sample. On the other side of the bar, the providers of the information can also have qualms. They may feel that sampling takes control out of their hands over what’s revealed to the requesting party. Sampling, they may feel, will give an opposing legal team data that they have no business seeing.

One way to address those problems is for both parties to decide on a sampling methodology before a discovery is executed. Another is for a requesting party to adopt of two-tier discovery strategy.

Sampling conditions can be hammered out when the lawyers on either side of a case meet to agree on the parameters for a search of electronically stored data. For example, the parties could agree that a data universe be sampled for a 98 percent confidence level and two percent error rate. “This allows the attorneys representing the producing party to certify and sign off on an agreed-upon target,” explains Venkat Rangan, a self-declared eDiscovery geek.

When an information producer samples a data universe, the results typically end up in three buckets. There’s data tagged for referral to the data requester, data considered privileged and outside the discover request, and data deemed irrelevant to the request. If a data producer’s sampling methodology isn’t transparent to the requester, there may be some doubt about what ends up in the irrelevant bucket. That’s where a two-tier process can come in handy.

During the first step of the process, the data provider forwards the information in the first bucket to the data requester. As soon as the requester receives that data, they make a second request for all data not previously produced, excluding exact duplicates, system files and privileged information.

“This additional step might involve a second mountain of data, but then you then have control of it, and you can search it using your own statistical protocols,” writes Nick Brestoff, western regional manager for discovery strategy and management at International Litigation Services in Los Angeles.

“In other words,” he added, “you might treat this data as if it consisted of backup tapes.”

After using their own sampling techniques on the second mountain of data, a requester may find nothing of value. On the other hand, Brestoff observed, “Perhaps in the data that you didn’t receive at first you will find the gold that you seek.”

Leave A Comment