The core challenge of discovery is identifying information that is responsive but not privileged, achieved without undue burden or expense. There are multiple ways to approach the task, none optimal.
The most labor-intensive method is called “linear human review,” where lawyers (for the most part) look at everything and cull responsive and privileged items. It sufficed in the pre-digital era when much effort and resources were devoted to recordkeeping, which insured that information had a “place.” Despite being costly, slow and error prone, linear review was all we had, so became the gold standard for identifying responsive and privileged information.
With the advent of personal computing, the internet and mobile devices, virtually all information today takes digital electronic forms that may be searched electronically. Digitized textual content, whether obtained by applying optical character recognition (OCR) to hard copy or by utilizing native electronic sources, makes it possible to find potentially responsive or privileged material by comparing text strings within documents to search terms expected to coincide with responsive or privileged content. Moreover, digital data always corresponds to a complement of digital metadata, viz. information that describes data’s location, nature and characteristics and that aids in the search, organization, interpretation and use of data.
As data volumes grew, text search and metadata culling became the new touchstones by which information was deemed potentially responsive and potentially privileged, usually as a precursor to manual assessment. Search terms, either by themselves or in logical phrases called Boolean queries, were deployed against the text within each document or more commonly against a concordance index built from extracted text. Items not making the keyword cut for responsiveness tended to be deemed not discoverable and afforded no further consideration.
Despite its outward simplicity, text search has its pitfalls; among them:
- Language is subtle and contextual.
- Obscure lexicons abound.
- Abbreviations are common.
- Items don’t yield searchable text (e.g., encrypted files).
- Indexing tools exclude common terms (stopwords).
- OCR is unreliable.
- Misspelling is rife.
- Sound and image data lack searchable text.
- Numbers aren’t indexed or may only appear when calculated (e.g., spreadsheets).
- Diacritical marks confound searches.
There is an art and science to effective search in electronic discovery. As former U.S. Magistrate Judge John Facciola aptly put it in United States v. O’Keefe, “[w]hether search terms or ‘keywords’ will yield the information sought is a complicated question involving the interplay, at least, of the sciences of computer technology, statistics and linguistics.” Pressed to decide the sufficiency of proffered searches, Judge Facciola concluded that, “[g]iven this complexity, for lawyers and judges to dare opine that a certain search term or terms would be more likely to produce information than the terms that were used is truly to go where angels fear to tread.”
Yet daily, lawyers go where angels fear to tread, confidently choosing search terms despite being untrained in search, unschooled in computer technology, statistics or linguistics and unmindful of the serious consequences of seat-of-the-pants search. Lawyers rarely test their choices and, when they do, tend to look only at raw hit counts (that is, the number of occurrences of the term or of documents containing the term) and will pronounce a term unsuitable because there were “too many hits.”
High hit counts don’t always signal failure. A lawyer who dismisses a search because it yields “too many hits” is as foolish as the Emperor Joseph dismissing Mozart’s Il Seraglio as a work with “too many notes.” Countered Mozart, “There are just as many notes as there should be.” Indeed, if data is properly processed to be susceptible to text search and the search tool performs appropriately, a keyword search generates just as many hits as there should be.
Few lawyers craft queries with the precision Mozart brought to opera; so when the search terms seem sensible, it’s crucial to scrutinize the results to know where and why the terms performed poorly. You must see the hits in context to nudge queries toward success. That seems so manifestly obvious, it’s astounding how often it’s not done or occurs in a high-handed way.
In our adversarial justice system, disputes that could be readily resolved with a modicum of care and candor are instead springboards for motion practice, with lawyers investing countless costly hours defending the “principle” that no party is obliged to share the contents of non-responsive documents. It’s true…but only to a point.
Indeed, many documents caught in the net of an over-inclusive keyword search are irrelevant with respect to the claims and defenses and are not documents sought in discovery. We wouldn’t expect a court to grant a motion to compel production of “patently irrelevant documents not sought in discovery.”
But enlightened thinking should prompt us to examine relevance in a broader, practical context; that is, information may be relevant to the integrity of process under the Rules. The document with the false hit has no bearing on the issues in the case; but, it bears mightily on whether the methods employed in discovery are calculated to “secure the just, speedy, and inexpensive determination of every action and proceeding” consistent with Rule 1 of the Federal Rules of Civil Procedure.
Producing counsel’s considered determination that a document is not relevant to the underlying case should be given wide berth. Certainly, it’s rebuttable, but for good cause shown. By contrast, producing counsel’s assertion that the search query is unacceptably over inclusive is not a dispute on the merits. It’s a dispute respecting the integrity of process, and the way in which the search performed poorly is relevant. Hence, transparency as to the mechanism of failure is required.
Before producing counsel grab torches and pitchforks, let me quickly add that this transparency must be narrowly construed and give way to countervailing considerations, foremost among them that false positive items may be withheld when to reveal their contents will compromise more compelling interests and the purpose of the requisite transparency may be satisfied by disclosure of alternate examples. What I contemplate does not throw the door open to non-relevant, non-responsive information except as disclosure of benign examples of false hits will narrowly serve to permit cooperation and collaboration calculated to improve the quality and/or lower the cost of discovery.
I’ve been flogging this idea for a long time; but without much to back it save compelling logic and a winning smile. Happily, there is now support in the case law.
One month ago, the Hon. Donna Ryu, a U.S. Magistrate Judge for the Northern District of California, issued an order in In re: Lithium Ion Batteries Antitrust Litigation, No. 13-MD-02420 YGR (DMR). The parties in that action had settled upon a protocol for the development and testing of search terms, which the Court summarized as follows (citations omitted):
- the producing/responding party will develop an initial list of proposed search terms and provide those terms to the requesting party;
- within 30 days, the requesting party may propose modifications to the list of terms or provide additional terms (up to 125 additional terms or modifications); and
- upon receipt of any additional terms or modifications, the producing/responding party will evaluate the terms, and
a. run all additional/modified terms upon which the parties can agree and review the results of those searches for responsiveness, privilege, and necessary redactions, or
b. for those additional/modified terms to which the producing/responding party objects on the basis of overbreadth or identification of a disproportionate number of irrelevant documents, that party will provide the requesting party with certain quantitative metrics and meet and confer to determine whether the parties can agree on modifications to such terms. Among other things, the quantitative metrics include the number of documents returned by a search term and the nature and type of irrelevant documents that the search term returns. In the event the parties are unable to reach agreement regarding additional/modified search terms, the parties may file a joint letter regarding the dispute.
However, the plaintiffs sought a means by which they could see a random sample of the false hits with privileged items removed for the purpose of revising the searches to be more precise.
The defendants objected that the samples would give plaintiffs access to non-responsive, irrelevant documents, that the process in place was sufficiently transparent and that sampling was unnecessary since there would be no showing that the defendants had failed to produce responsive information.
The plaintiffs conceded that the process would require production of non-responsive, non-privileged items but only for the limited purpose of determining “why a particular search term fails to return an appropriate set of documents, enabling the parties to focus computer searches on relevant, discoverable material.” In the end, the plaintiffs persuaded the Court that “the best way to refine searches and eliminate unhelpful search terms is to analyze a random sample of documents, including irrelevant ones, to modify the search in an effort to improve precision.”
The Court ordered the sampling procedure, acknowledging defendants’ “valid concern” that the samples will afford plaintiffs access to “irrelevant information to which Plaintiffs have no right.” The Court dismissed the concern as one easily mitigated by allowing defendant to withhold any irrelevant items within the sample for any reason so long as they replaced the items withheld with additional random samples which the defense did not feel compelled to withhold. The Court further mandated that the irrelevant items in the samples turned over to plaintiffs may not be used for any purpose other than resolving disputes involving search terms and would be seen by just one lawyer for each of the various litigation groups, who would, in turn, be obliged to destroy the irrelevant samples and any notes about them within two weeks of resolution of the search term dispute, with destruction confirmed by affidavit. Finally, the Court limited the process to no more than five search terms per defendant group.
Overkill? Maybe, but still a wise and practical decision. The same end might have been achieved more simply by putting the burden on the producing party challenging the queries to supply representative, non-privileged examples of non-responsive items deemed false positives. Searches that produce large numbers of false hits tend to do so in consistent ways (e.g., the same boilerplate language in the footer of every e-mail). In my experience, a few telling examples are all one needs to tweak queries to be more precise. Ideally, the effort is face-to-face and collaborative, with counsel having the ready ability to test the modified queries against representative samples of the dataset so as to promptly determine whether the proposed tweaks are delivering more precise returns without a big drop in recall.
Arguably, employing a random sample affords some protection against the defense gaming the process (such as where artfully chosen examples might prompt the plaintiffs to accept modifications intended to hide damaging documents). I’m just not that paranoid. It’s not confidence in the integrity of counsel so much as it is acceptance of a maxim called Hanlon’s Razor explained to me by the late, great Browning Marean. It states, “Never attribute to malice that which is adequately explained by stupidity.” Most lawyers can’t keep up with sound practice in e-discovery, let alone finesse the outcome.
It’s an easier issue to resolve when the dispute concerns imprecise keyword searches because disclosure of a handful of benign examples suffices to tweak keyword searches. But, mark this decision as a precursor to bigger battles, most particularly the requisite level of transparency attendant to seed- and training sets used in connection with predictive coding tools.
 United States v. O’Keefe, 537 F. Supp. 2d 14 (D.D.C. 2008)
 This assumes the requests for production were couched descriptively rather than as an agreed-upon set of keyword searches. Parties may define relevance in terms of a document’s responsiveness to an agreed-upon search methodology whether or not a substantive assessment of the document would prompt the same conclusion.
Tip of the hat to Doug Austin of the E-Discovery Daily blog for highlighting this case.
Pingback: Too Many Notes: In re: Lithium Ion Batteries Antitrust Litigation | @ComplexD
I would agree with many of your observations, but disagree that you have nothing to support your contentions but “compelling logic and a winning smile”, although I am sure the smile will save the day. It may be helpful to be more concrete regarding the balance in any sample set, specifically what attributes are most important to training set selection. May have written on the size of the training set, which is relatively straight forward, but I would suggest that one feature that any training same should share with the general document population should be the probability distribution. More specifically, the probability of randomly selecting a “relevant” document from the training set, should be equal to the probability of selecting a “relevant” document from the population at large; Probability Test (Y|X) = Probability Population (Y|X). If one were to offer a rich training set, i.e. more relevant documents than would typically be found in the document population at large, you would skew the apriori probability of the predictive covariants. A really good discussion of this issue is found in a book entiled Dataset Shit in Machine Learning (2009), a copy may be found at http://www.acad.bg/ebook/ml/The.MIT.Press.Dataset.Shift.in.Machine.Learning.Feb.2009.eBook-DDU.pdf
Dear Mr. Cronin: Thanks for the comment, although I made no observations respecting the balance of the sample set, so your contribution comes out of left field. Moreover, I suspect the article you cite might be about Dataset Shift (although I prefer your rendition as more descriptive for the average reader).
Sorry for the misspelling, tragically overshadows the point I was trying to make. Feel free to delete the comment or edit same.
Nah. To err is human, to err in ways that make people smile is divine.
Michael Carbone said:
This is a great post. What I seem to recall from the movie “Amadeus” is that Mozart asked the Emperor “Which notes would you like me to take out, your Majesty?”
Can’t say; but, some scholars dismiss the anecdote entirely as being based on a bad translation. They claim the Emperor observed that there were many notes, not too many notes. As they say in The Man Who Shot Liberty Valance, “[w]hen the legend becomes fact, print the legend.”