I’ve been thinking about how we implement technology-assisted review tools and particularly how to hang onto the on-again/off-again benefits of keyword search while steering clear of its ugliness. The rusty flivver that is my brain got a kickstart from many insightful comments made at the recent CVEDR e-discovery retreat in Monterey, California. As is often the case when the subject is technology-assisted review (by whatever name you prefer, dear reader: predictive coding, CAR, automated document classification, Francis), some of those kicks came from lawyer Maura Grossman and computer scientist Gordon Cormack. So, if you like where I go with this post, credit them. If not, blame me for misunderstanding.
Maura and Gordon are the power couple of predictive coding, thanks to their thoughtful papers and presentations transmogrifying the metrics of NIST TReC into coherent observations concerning the efficacy of automated document classification. While they’re spinning straw into gold. I’m still studying it all; but from where I stand, they make a lot of sense.
Maura expressed the view that technology-assisted review tools shouldn’t be run against subset collections culled by keywords but should be turned to the larger collection of ESI (i.e., the collection/sources against which keyword search might ordinarily have been deployed). The gist was, ‘use the tools against as much information as possible, and don’t hamstring the effort by putting old tools out in front of new ones.’ [I’m not quoting here, but relating what I gleaned from the comment].
At the same Monterey conference, Judge Andrew Peck reminded us of the perils of GIGO (Garbage In : Garbage Out) when computers are mismanaged. The devil is very much in the details of any search effort, but never more so than when one deploys predictive coding in e-discovery. Methodology matters.
If technology-assisted review were the automobile, we’d still be at the stage where drivers asked, “Where do I hook up my mules?” Our “mules” are keyword search.
When you position keyword search in front of predictive coding; that is, when you use keyword search to create the collection that predictive coding “sees,” the view doesn’t change much from the old ways. You’re still looking at the ass end of a mule. Breath deep the funky fragrance of keyword search. Put axiomatically, no search technology can find a responsive document that’s not in the collection searched, and keyword search leaves most of the responsive documents out of the collection.
Keyword search can be very precise, but at the expense of recall. It can achieve splendid recall scores, but with abysmal precision. How, then, do we avail ourselves of the sometimes laser-like precision of keyword search without those awful recall in-laws coming to visit? Time-and-again, research proves that keyword search performs far less effectively than we hope or expect. It misses 30-80% of the truly responsive documents and sucks in scads of non-responsive junk, hiding what it finds in a blizzard of blather.
To be clear, that’s an established metric based on everyone else in the world. It doesn’t apply to YOU. YOU have the unique ability to frame fantastically precise and effective keyword searches like no one else. Likewise, all the findings about the laughably poor performance of human reviewers applies only to other reviewers, not to YOU. Tragically, not everyone has the immense good sense to employ YOU; so, let’s take YOU and what YOU can do out of the equation until human cloning is commonplace, okay? 😉
For all their shortcomings, mules are handy. When your Model-T gets stuck in the mud, a mule team can pull you out. Likewise, keyword search is a useful tool to pull us out of the sampling swamp and generate training sets. Using keywords, you’re more likely to rapidly identify some responsive documents than using random sampling alone. These, in turn, increase the likelihood that predictive coding tools will find other responsive documents in the broader collection of ESI sources. Good stuff in : good stuff out.
With that in mind, I made the following slide to depict how I think keyword search should be incorporated into TAR and how it shouldn’t. (George Socha is so much better at this sort of thing, so forgive my crude effort). This is a work-in-progress, and I’d very much like your comments on the merits. My mind is still open on all of this, especially to remarks by those who can offer evidence to make their case.
I hope you’ll agree that the interposition of keyword search to cull the collection before it’s exposed to an automated document classification tool is wrong. But, in fairness, doing it the right way could come at a cost depending upon how you approach the assembly and processing of potentially responsive ESI. If you have to pay significantly more to let the tool “see” significantly more data, then quality will be sacrificed on the altar of savings. How it shakes out in your case hinges on how you handle keyword search and what you’re charged for ingestion and hosting. Currently, many use keyword search via entirely separate tools and workflows to reduce the volume of information collected, processed and hosted. Garbage In.
Another caution I think important in using keywords to train automated classification tools is the requirement to elevate precision over recall in framing searches to insure that you don’t end up training your predictive classification tool to replicate the shortcomings of keyword search. If only 20% of the documents returned by keyword search are responsive, then you don’t want to train the tool to find more documents like the 80% that are junk. So when, in the illustration above, I depict keyword search as a means to train technology-assisted review tools, please don’t interpret the line leading from keyword search to TAR as suggesting that the usual guesswork approach to keyword search is contemplated and you’ll just dump keyword results into the tool. That’s like routing the exhaust pipe into the passenger compartment. The searches required need to be narrow–precise–surgical. They must jettison recall to secure precision…and may even benefit from a soupçon of human review.
For the promise of predictive coding to be fulfilled, workflows and pricing must better balance the quality vs. cost equation. Yes, a technology that is less costly when introduced at nearly any stage of the review process is great and arguably superior only by being no worse than alternatives. But if that is all we seek when quality is also within easy reach, we do a disservice to justice. The societal and psychic benefits of a more trusted and accurate outcome to disputes cannot be overvalued. “Perfect” is not the standard, but neither is “screw it.”
Patricia Gardner said:
I concur. Well said.
LikeLike
Pingback: Digital Forensics, Inc. Digital Forensics, Inc. – Train, Don’t Cull, Using Keywords « Ball in your Court | Digital Forensics, Inc.
Caroline Pollard said:
Makes a lot of sense. I like the idea of using keywords to identify the training set.
LikeLike
Thomas Cassel said:
Agree! At some point the pricing structure needs to support the best practices work flow.
LikeLike
Chuck Kellner (@ChuckKellner) said:
Craig, I’m with you almost all the way. While the drawbacks of keyword search are well known, I’m not clear about the broad statement “keyword search leaves most of the responsive documents out of the collection”. Second, I think that there is a point at which there still must be an intelligent selection of what ESI to put into predictive coding. We don’t want to use it to boil the ocean. To promote putting into it as large and broad a collection as possible has an upstream impact on preservation obligations and their associated costs. We don’t want to be in a position of saying that because the selection tools are now so much more efficient that we can and should afford to hold hostage even more ESI. Regarding cost, I couldn’t agree more, and as we are seeing these workflows become more routine, the costs for the technology and the associated text processing is quickly coming down.
LikeLike
Craig Ball said:
Chuck, I have great respect for your thoughts in this area, and so regret that we aren’t of one mind on this. I certainly am not advocating that one preserve more irrelevant material because it will be easier or cheaper to search downstream. I don’t see how one follows the other. The scope of the preservation duty is defined by the anticipation of litigation and the potential for relevance. You don’t get to preserve more or less because you will be later be using keywords, linear review or predictive coding to cull in review.
And I stand by my statement respecting keywords. Save for the far-too-rare case where a few unique searches can deliver high recall and precision, keyword search consistently delivers disappointing outcomes in the 20-30% recall range. Yes, with tweaking and iteration, it can deliver higher recall and precision, but few litigants employ iterative search or testing. Most never see performance higher than Blair & Marron’s described in their seminal 1985 study. Accordingly, when I stated that “keyword search leaves most of the responsive documents out of the collection,” I did so because that is the unfortunate norm and a conclusion supported by both experimental metrics and hard-won personal experience. It’s easy to improve recall at the expense of precision; but, then what’s the point of doing keyword search at all, since what you’re doing is (essentially) grabbing everything?
Thanks for the comment and for reading the post.
LikeLike
Pingback: The Many Faces of Mike McBride » Blog Archive » Links (weekly)
Bill Tolson said:
Francis ranks up there with Sheldon and HAL
LikeLike
Ward Stacken said:
Greetings! Very helpful advice within this post! It’s the little changes which will make the greatest changes. Many thanks for sharing!
LikeLike
Pingback: Technology-Assisted Review: From Expert Mentions to Mainstream Coverage | @ComplexD