Earlier this year, I delivered the keynote address for a corporate event in Canada, I called the talk, “Spoiled and Deluded: Ugly Truths about Electronic Search.” I lamented how our happy experience with Google and online legal research has left us woefully unprepared (“spoiled”) for the extreme difficulty of search in e-discovery, then dashed a few misconceptions about the efficacy of searching ESI in accepted ways (“deluded”). Dear Reader, we need to be brutally frank about search; because in a world where the organization of information has gone the way of the typewriter and file room, effective, efficient search is something we cannot manage without.
Search has two non-exclusive ways to fail: your query will not retrieve the information you seek and your query will retrieve information you weren’t seeking. The measure of the first is called “recall,” and of the latter, “precision.” We want what we’re looking for (high recall) and only what we are looking for (high precision).
Recall and Precision aren’t friends. In e-discovery, they’re barely on speaking terms. Every time Recall has a tea party, Precision crashes with his biker buddies and breaks the dishes.
It’s easy as pie to achieve a high recall of responsive information in e-discovery. You simply grab it all: 100% of the data = 100% recall. But, if only one out of a hundred items is what you seek, your precision stinks–it’s just 1%. You’ll look at 99 irrelevant documents for each one worth reviewing. Some call this The Practice of Law, and most lawyers mistakenly regard it as the safest course, lest a party fail to produce something or produce something that should have been withheld.
Because it’s time consuming, it’s expensive. Worse, it doesn’t work very well. People make assessment errors; and making lots of assessments, they make lots of errors. My friend and fellow commentator, Ralph Losey, lately blogged about the shortcomings of search and review calling them “dark secrets.” Don’t miss Ralph’s posts, but know that he has a penchant for revealing secrets that are “secret” in the same way that the square root of 256 is “secret.” Most won’t know it’s 16 off the top of their head; but like the problems of search, it’s easy to figure out if you’re even slightly curious! Kudos to Ralph for using the ploy of revealing “secrets” to inspire curiousity.
The errors we make in search can be subtle and hypertechnical, but they usually aren’t. Most mistakes I see in keyword search are of the boneheaded variety. If we eliminate the dumbest mistakes, we improve the quality of e-discovery and markedly trim its cost. Search will ever be a battle between Recall and Precision, but avoiding boneheaded errors will limit casualties.
Boneheaded Mistake 1: Searching for a custodian’s name or e-mail address in the custodian’s e-mail
If you run a list of search terms including a custodian’s name or e-mail address against their own e-mail, you should expect to get hits on all messages, rendering the search useless. I know some of you are saying, “Craig, no one’s that boneheaded!. I say, “Wanna bet?” I see this mistake with regularity. I see it done by big firms touting their e-discovery expertise. I see it done by plaintiffs and defendants. I see vendors content to run these searches without flagging the error. Ask yourself: how often are the proposed search term lists exchanged between counsel carefully broken out by particular custodians or forms of ESI to be searched?
Boneheaded Mistake 2: Assuming the Tool can Run the Search
Unless you plan to read everything, you can’t search ESI for keywords without a search tool; and, if you’ve ever tried to drive a screw with a hammer, you know that tools do some tasks better than others, and some tasks they don’t do at all. Why can’t you use Google to know what’s in your fridge? Because the information isn’t online…yet. Every ESI search tool has limitations: The data may not have been collected, or it wasn’t indexed or the search syntax is wrong. Most e-discovery searches are run against an index of the words in the ESI; but, text indexers don’t index information that isn’t text (like pictures of words that haven’t been run through an optical character recognition process). They don’t index text they can’t access, like encrypted documents or documents encoded in unfamiliar ways. Plus, they don’t index parts of speech called “noise” or “stop” words deemed so common they’ll gum up the works. I call this the “To Be or Not to Be” problem, because all of the words in Hamlet’s famous question tend not to be indexed in e-discovery.
A related mistake is using the wrong or an unsupported search syntax. Not every search tool supports every common feature of search (e.g., wildcard characters, Boolean constructs, stemming, proximity searches or regular expressions), and not every tool uses the same methods or characters to deploy the same features. If you’re not certain how the search tool processes *, !, ?, /w and %, don’t assume they work as you imagine.
Boneheaded Mistake 3: Not Testing Searches
Much of what distinguishes a mistake as boneheaded is the ease with which it could have been avoided. When a party to a lawsuit once proposed the letter “S” be used as a search term, I didn’t need to test it to know that it was a boneheaded choice. But what about all those terms that routinely occur in file paths or are inevitably encountered in profusion within ESI having nothing to do with the case? If you don’t know if the list of keywords you’re about to run includes some of these terms, what is the boneheaded thing to do? Right! Run them against your entire ESI collection without testing them first!
Even search terms that appear bulletproof can surprise you. Test your searches to be sure they perform as expected.
Boneheaded Mistake 4: Not Looking at the Data!
How much chatter about whether it’s raining outside will you listen to before looking out the window? Don’t just natter on about the quantity of hits to evaluate your search; check the quality of the hits. Look at the data! Fifteen minutes spent looking at the data can eliminate weeks or months of reviewing crappy results and a zillion dollars spent in motion practice.
Boneheaded Mistake 5: Ignoring the Exceptions List
It’s the rare e-discovery effort where everything processes without exception. There will typically be hundreds or thousands of items that are encrypted, corrupt, unrecognized or unreadable. A report of these exceptions is usually generated during processing. Too often, these exceptions are forgotten soon after they’re identified or are misclassified as benign. It’s a calculated risk to decide that the exceptional items can be ignored; but, to forget these exceptions exist is a boneheaded mistake.
There are five boneheaded mistakes to prime the pump. Now, how about sharing a few of your own?
P.S. Testing terms? Looking at data? To do these things, you need a desktop tool that makes it possible. better still, one that’s dirt cheap and extraordinarily powerful. If so, don’t miss this important post. Time is running out!