I’ve been skeptical of predictive coding for years, even before I wrote my first column on it back in 2005. Like most, I was reluctant to accept that a lifeless mass of chips and wires could replicate the deep insight, the nuanced understanding, the sheer freaking brilliance that my massive lawyer brain brings to discovery. Wasn’t I the guy who could pull down that one dusty box in a cavernous records repository and find the smoking gun everyone else overlooked? Wasn’t it my rarefied ability to discern the meaning lurking beneath the bare words that helped win all those verdicts?
Well, no, not really. But, I still didn’t trust software to make the sort of fine distinctions I thought assessing relevance required.
So, as others leapt aboard the predictive coding bandwagon, I hung back, uncertain. I felt not enough objective study had been done to demonstrate the reliability and superiority of predictive coding. I well knew the deep flaws of mechanized search, and worried that predictive coding would be just another search tool tarted up in the frills and finery of statistics and math. So, as Herb and Ralph, Maura and Gordon and Karl and Tom sung Hosannas to TAR and CAR from Brooklyn Heights to Zanzibar, I was measured in my enthusiasm. With so many smart folks in thrall, there had to be something to it, right? Yet, I couldn’t fathom how the machine could be better at the fine points of judging responsiveness than I am.
Then, I figured it out: The machine’s not better at fine judgment. I’m better at it, and so are you.
So why, then, have I now drunk the predictive coding Kool-Aid and find myself telling anyone who will listen that predictive coding is the Way and the Light?
It’s because I finally grasped that, although predictive coding isn’t better at dealing with the swath of documents that demand careful judgment, it’s every bit as good (and actually much, much better) at dealing with the overwhelming majority of documents that don’t require careful judgment—the very ones where keyword search and human reviewers fail miserably.
Let me explain.
For the most part, it’s not hard to characterize documents in a collection as responsive or not responsive. The vast majority of documents in review are either pretty obviously responsive or pretty obviously not. Smoking guns and hot docs are responsive because their relevance jumps out at you. Most irrelevant documents get coded quickly because one can tell at a glance that they’re irrelevant. There are close calls, but overall, not a lot of them.
If you don’t accept that proposition, you might as well not read further; but if you don’t, I question whether you’ve done much document review.
It turns out that well-designed and –trained software also has little difficulty distinguishing the obviously relevant from the obviously irrelevant. And, again, there are many, many more of these clear cut cases in a collection than ones requiring judgment calls.
So, for the vast majority of documents in a collection, the machines are every bit as capable as human reviewers. A tie. But giving the extra point to humans as better at the judgment call documents, HUMANS WIN! Yeah! GO HUMANS! Except….
Except, the machines work much faster and much cheaper than humans, and it turns out that there really is something humans do much, much better than machines: they screw up.
The biggest problem with human reviewers isn’t that they can’t tell the difference between relevant and irrelevant documents; it’s that they often don’t. Human reviewers make inexplicable choices and transient, unwarranted assumptions. Their minds wander. Brains go on autopilot. They lose their place. They check the wrong box. There are many ways for human reviewers to err and just one way to perform correctly.
The incidence of error and inconsistent assessments among human reviewers is mind boggling. It’s unbelievable. And therein lays the problem: it’s unbelievable. People I talk to about reviewer error might accept that some nameless, faceless contract reviewer blows the call with regularity, but they can’t accept that potential in themselves. “Not me,” they think, “If I were doing the review, I’d be as good as or better than the machines.” It’s the “Not Me” Factor.
Indeed, there is some cause to believe that the best trained reviewers on the best managed review teams get very close to the performance of technology-assisted review. A chess grand master has been known to beat a supercomputer (though not in quite some time).
But so what? Even if you are that good, you can only achieve the same result by reviewing all of the documents in the collection, instead of the 2%-5% of the collection needed to be reviewed using predictive coding. Thus, even the most inept, ill-managed reviewers cost more than predictive coding; and the best trained and best managed reviewers cost much more than predictive coding. If human review isn’t better (and it appears to generally be far worse) and predictive coding costs much less and takes less time, where’s the rational argument for human review?
What’s that? “My client wants to wear a belt AND suspenders?” Oh, PLEASE.
What about that chestnut that human judgment is superior on the close calls? That doesn’t wash either. First–and being brutally honest–quality is a peripheral consideration in e-discovery. I haven’t met the producing party who loses sleep worrying about whether their production will meet their opponent’s needs. Quality is a means to avoid sanctions, and nothing more.
Moreover, predictive coding doesn’t try to replace human judgment when it comes to the close calls. Good machine learning systems keep learning. When they run into one of those close call documents, they seek guidance from human reviewers. It’s the best of both worlds.
So why isn’t everyone using predictive coding? One reason is that the pricing has not yet shifted from exploitive to rational. It shouldn’t cost substantially more to expose a collection to a predictive coding tool than to expose it to a keyword search tool; yet, it does. That will change and the artificial economic barriers to realizing the benefits of predictive coding will soon play only a minor role in the decision to use the technology.
Another reason predictive coding hasn’t gotten much traction is that Not Me Factor. To that I say this: Believe what you will about your superior performance, tenacity and attention span (or that of your team or law firm), but remember that you’re spending someone else’s money on your fantasy. When the judge, the other side or (shudder) the client comes to grips with the exceedingly poor value proposition that is large-scale human review, things are going to change…and, Lucy, there’s gonna be some ‘splainin to do!