Three weeks ago, skulking around the mummies in a small-but-fine museum on the University of Sydney campus, I learnt that mystery writer Agatha Christie was married to archaeologist, Max Mallowan, and that she’d assisted him in Syrian digs. Dame Agatha even used her cold cream and knitting needles to clean rare ivory artifacts. The experience found its way into her work. An exhibit of Christie-cleaned carvings included a quote from the author’s fictional detective, Hercule Poirot, in Death on the Nile (1937):
Once I went professionally to an archaeological expedition–and I learnt something there. In the course of an excavation, when something comes up out of the ground, everything is cleared away very carefully all around it. You take away the loose earth, and you scrape here and there with a knife until finally your object is there, all alone, ready to be drawn and photographed with no extraneous matter confusing it. That is what I have been seeking to do–clear away the extraneous matter so that we can see the truth–the naked shining truth.
This naturally got me thinking about the way we approach search in electronic discovery. Most lawyers use keywords to find documents responsive to discovery despite their propensity to sweep up too much chaff. We get lots of the documents we seek with keywords; unfortunately, the results come caked with the loose earth of documents containing keywords but having no connection to the case. Testing confirms this occurs with a ratio of about 20% responsive matter to 80% extraneous. That’s a lot of loose earth!
The current industry practice is for keyword-culled documents to undergo horrifically expensive brute force review, i.e., bored lawyers reading each page. Such spirit crushing linear review accounts for anywhere from 50-90% of the total cost of e-discovery; consequently, when you reduce lawyer review time, you slash the biggest contributor to cost…and waste. If most of the material culled by keyword search is extraneous matter, any technique that pulls away chaff without grabbing wheat translates to significant savings of time and money while improving quality by minimizing candidates for mischaracterization.
So, maybe we should be looking at the value in a second, unique keyword pass preceding review that, like Agatha Christie’s knitting needle or the archeologist’s knife, clears away loose earth. This pass doesn’t look for responsive documents. It employs keywords to find documents that are NOT likely to be responsive; that is, it’s calculated to clear away the extraneous matter so we can see the naked shining truth.
This is “negative search.” The notion of negative search isn’t original with me, but neither is it much used by anyone else. Though similar in certain respects, negative search is not the same as using Boolean constructs to exclude noise hits. Boolean constructs are quite effective when artfully composed, but can be challenging to frame and tricky to execute. Negative search doesn’t restrict queries in the way Boolean constructs do. Instead, negative search finds all documents containing terms deemed highly unlikely to occur within responsive documents, like “birthday cake,” “fantasy football” or “bridal shower.” These are then excluded from review. Clearly, negative search terms must be chosen wisely and tested carefully against representative samples of the collection before broad deployment. Like the NIST list, negative search terms, once compiled, can be used in subsequent cases–again with testing to guard against unexpected outcomes. So, consider if there’s a role for negative search in your next e-discovery effort and know that, in almost any collection, there’s a corpus of extraneous data that can be cost-effectively culled by negative search.
Steve Green said:
Dear Craig,
We’d better get an opinion letter on that patent – looks like we might be infringing.
Thanks,
Steve
P.S. How’s your fantasy football team doing? 🙂
LikeLike
Ben Hogan said:
The idea of ‘negative’ search is taking the search paradigm and looking for creative ways to apply it. Too many search results is always seen as a negative because we think of search terms linearly as documents that must be reviewed. A search of the word ‘abicus’ across 30 custodians may generate 30,000 results, a dubious task.
What if search terms were used as a tool to identify people. This enables generic words like “contract” or “price” to become powerful tools for identifying persons with certain roles. Now take those persons and use them as a tool for limiting the original search effort. Now we are not layering keywords and culling the data by content. Instead, we are focusing the analysis to the communications between the key people. It is a much larger list than the custodian list, but it is a much smaller list than everyone in the data. Use people and relationships to focus the early analysis on 3-4% of the data, and now you have a whole new ball game. Now your original search of “abicus” returrns 3000 results. Next, use other investigative techniques, including ‘negative search’ to limit the focus to a smaller group of people – search yields 50 emails that represent the most important communications between the key people about that particular topic. The smoking guns are right there and an investigator got to them in a couple of hours by 1. Identifying the key people and then 2. Following their relationships. Does this eliminate the review? No but it generates evidence and intelligence that serve to drive efficiencies across all case strategy – meet and confer, interview preparation, key word selection – everyone knows that accuracy in the first 24-48 hours of a matter dramatically impacts all downstream costs and efforts. Catelas enables counsel to focus on relationships and behaviors, which is different than an activity-driven link analysis map.
LikeLike