There have lately been a boatload of good stories written about Google’s so-called ‘Billion Dollar E-Discovery Blunder.’ Yes, it was a blunder, and, though the damages are dwindling, maybe it will end up costing Google a billion bucks; but, I’m not so sure it’s all that much an e-discovery issue. That said, I’m going to wind this post up with a suggestion of a simple technique for QA/QC in e-discovery you can use to keep your client or company from the same predicament.
First, the Blunder: Oracle sued Google claiming that Google’s Android smartphone platform infringes Oracle’s Java programming language patents. With almost $27 billion in revenue and $6 billion in profits, Oracle is #96 on the Fortune 500 list of companies that suck at e-discovery. Google is #92, with $29 billion in revenues and $8.5 billion in profits. So, it’s a well-matched, Goliath vs. Goliath fight (and even Goliath is going, “Damn, they’re big and rich”).
Plus, it’s got to be personal for many folks in Silicon Valley. Redwood City and Mountain View, California are just minutes apart, so you can imagine that when the two Larrys (Oracle’s Ellison and Google’s Page) bump into each other at Fry’s or waiting in line at the DMV, it’s can’t be all guy hugs and fist bumps..
In the cozy bosom of the Valley, folks don’t just haul off and sue their (very rich and powerful) neighbors. Instead, they have meetings where, fueled by green tea and organic vegan muffins, the lawyers present their case to avoid all the muss, fuss and bother of a lawsuit. Something like that happened on July 20, 2010, when Oracle’s lawyers presented contemplated claims to Google’s senior Counsel, Ben Lee. Later, Lee, Google’s GC and an engineer named Tim Lindholm put their heads together to strategize about the coming onslaught.
Here, it makes sense to simply quote from the Court of Appeals’ 2/6/12 order in In re: Google, Inc.:
At 11:05 a.m. on August 6, 2010, Lindholm sent an email to the attention of Andy Rubin, Google’sVice President in charge of its Android operating platform. Lindholm also included Lee, himself, and another Google engineer, Dan Grove, on the email. The body of the email provided as follows:
Attorney Work Product
Google ConfidentialHi Andy,
This is a short pre-read for the call at 12:30. In Dan’s earlier email we didn’t give you a lot of context, looking for the visceral reaction that we got.What we’ve actually been asked to do (by Larry and Sergei) is to investigate what technical alternatives exist to Java for Android and Chrome. We’ve been over a bunch of these, and think they all suck. We conclude that we need to negotiate a license for Java under the terms we need.
That said, Alan Eustace said that the threat of moving off Java hit Safra Katz hard. We think there is a value in the negotiation to put forward our most credible alternative, the goal being to get better terms and price for Java.
It looks to us that Obj-C provides the most credible alternative in this context, which should not be confused with us thinking we should make the change. What we’re looking for from you is the reasons why you hate this idea, whether you think there’s anything we’ve missed in our understanding of the option.
–Tim and Dan
Before you get your knickers in a twist worrying about how hard this will be on “Safra Katz,” know that they were surely talking about Safra Catz, Co-President of Oracle and named one of the Most Powerful (and Highly Paid) women in the world. I’d guess she’d have more to worry about being alone in a room with her Co-President, the overcharged and overcharging Mark Hurd.
This e-mail lodged in several locations around the Googleplex and, after suit, Google produced several copies of this message and withheld other copies, claiming they were privileged and listing them in its privilege log (see below). Ten months later, Google tripped to its error and demanded return of the copies. Inadvertent production? Pretty clearly. Timely action or waiver by use without objection? Non issues, as it turns out.
The Court found that the e-mail was not privileged as either a confidential attorney-client communication or as attorney-work product. The reason I don’t think the opinions are important e-discovery rulings is because the rationale and result would have been exactly the same if these had been old-timey paper memos. The medium had nothing to do with the issues or outcome.
The e-discovery nexus arises from the way the copies were retained (reportedly as autosaved backups) and from the failure to intercept the copies before production. There are all sorts of bright ideas emerging from smart folks who have groundbreaking tools they could sell to that search naif, Google, to help it avoid this ever happening again. Reading some of these missives made me think of a post I wrote two years ago for the EDDUpdate blog called, “A Quality Assurance Tip for Privileged ESI.”
In that long-ago post, I called for producing parties to run a last-ditch quality assurance linear keyword search against the final production set before it goes out the door. Linear (i.e., across the actual documents in the set, not against an index) because such a belt-and-suspenders approach compensates for inherent and unforeseen omissions and corruptions in the index. The keywords and phrases searched would be unique selections from the highly sensitive privileged documents you’d already found and listed on the privilege log. Yes, that’s right, you’re searching for things you already know are privileged and that you believe have already been culled from the set. Missing these was Google’s gaffe.
Not all inadvertent production of privileged material stems from what you fail to find. Often, you let slip the very thing that you know about and are most determined to protect. It’s like trying not to think of a purple hippopotamus.
Again, this is a QA/QC technique applied to the production set on the eve of production. It’s a simple, effective, quick and cheap way to protect against the most damaging privileged communications slipping through because, as Google learned, they do slip through.
Here, the terms searched might have been, e.g.,:
- “technical alternatives exist to Java”
- “think they all suck”
- “hit Safra Katz hard”
- “Obj-C provides the most credible alternative”
- the subject line of the message
The chance of hitting an unrelated document is nil, but the likelihood of catching a draft or autosaved version is high. You want the terms searched to be unique (bad grammar and misspellings help) and extracted from various points across the body of the document or message (in case it’s an early draft or a corrupted or truncated version). You’re not looking for unknowns, and don’t waste time on the trivial, i.e., on all that silly chaff in most privilege logs. You want to do this for the “if this message makes it to the other side, we’re hosed” stuff.
Did the lawyers know this message was a bombshell? You’d think so, considering the license they took in the privilege log when describing the copies withheld; to wit, “Email reflecting advice of counsel in preparation of litigation re alternatives for Java for Android and Chrome” and “Email seeking advice of counsel re technical alternatives for Java.” I didn’t see any advice of counsel in there, did you? And I just bet the engineers at Google run to the legal department for technical advice when they’re stumped. In…my…dreams!
You can stop here, or you can go on to read what I published two years ago (while you imagine me maturely going “Nyah, nyah, na-nyah nyah” in that smug way that makes even me want to punch me):
We squander so much money in e-discovery searching for confidential attorney-client communications. “Squander” because it’s an outsize expense that could have been largely eliminated with minimal effort at the time fingers met keyboard. It’s not as though counsel are wholly unaware of the sensitivity of privileged communications when made. If it had been a face-to-face conversation, we’d have had the presence of mind to shut the door or ask those outside the ambit of privilege to leave. Lawyers really aren’t as stupid as we sound in the reported decisions.
If we have the presence of mind to recognize and protect a confidential attorney-client communication when made face-to-face–if we’re savvy enough to say, “Wait a second while I take this off speakerphone,”–why are we incapable of bringing the same cautious mien to our electronic conversations? And, why-oh-why do we forget the most important component of quality assurance before producing material posing a risk of inadvertent production of privileged communications?
Ask a judge who’s done an in camera review of privileged ESI what percentage of the material submitted was truly privileged, and you’re likely to hear numbers hovering way below fifty percent. An average assessment of 20% or less wouldn’t surprise me. Does anyone ever review the definition of a confidential attorney-client communication anymore? Is it not in the Nutshells today?
The favored technique to cull privileged material from ESI entails looking for any material that includes lawyers’ names, firm names, lawyer and firm e-mail addresses and words like “privileged.” It’s a criterion geared to grab everything sent to or from an attorney or firm. If this surfeit of material isn’t just lazily set aside and forgotten, it’s painstakingly reviewed page-by-hourly-charged-page.
Heaven forbid that the profession should ever be forced to surrender this bountiful boondoggle. Imagine, we seed the client’s ESI with privileged communications, then bill the client to segregate it. If BP were a law firm, it’d be charging for cleaning up the crude.
…
I post now to suggest something you will embrace because it’s an essential step in QA/QC that’s so obvious, I’m amazed at how rarely it’s done.
Assume you’ve done your privilege review, and you’re ready to make production. Suddenly, you hear a little voice in the back of your mind. Why, it’s Judge Paul Grimm, whispering, “QA…QC…QA…QC…Victor Stanley.” You know you’re supposed to do something to check the quality of your production to be assured that you haven’t inadvertently allowed privileged communications to slip through your net. But what? You’ve searched the index for lawyers’ names, firm names, e-mail addresses and words like “law,” “advise,” “liable,” “criminal” and “attorney.” You’ve double-checked random samples. What more can you do?
One thing you should absolutely do is search the material about to be produced for examples of confidential attorney-client communications you know exist. That is, the stuff you most fear the other side seeing. Examples are prabably right there in your file and e-mail. You should have a set of unique searches composed to ferret out these bombshells in anything you send to the other side. It’s the stuff for which you most need quality assurance and control, because it’s the stuff that would be most prejudicial if it crossed over.
I also suggest that you search for these core privileged communications linearly across the contemplated production, not just within the indices, because–let’s face it–indexed search is fast, but it misses stuff that linear search picks up. I get why you don’t search the collection linearly at the outset–life is short–but when it’s culled to a production set, don’t you want the benefit of both technologies to protect against inadvertent production of the most sensitive, privileged material?
FYI: Linear search goes through the documents in the collection seeking keywords matches. Indexed search looks solely to indices, i.e., lists of words meeting certain criteria for inclusion and exclusion, such words having been culled from the documents in the collection.
I recently commented on a long, thoughtful post of Ralph Losey’s discussing Mt. Hawley Ins. Co. v. Felman Production, Inc., 2010 WL 1990555 (S.D. W. Va. May 18, 2010). I summed up my sentiments this way:
For all the many challenges there are to isolating privileged material in voluminous ESI, finding the privileged items well known to counsel and appearing in their own files need not be one of them.
What do you think?
We need more public discussion about QA/QC techniques geared to protecting against inadvertent disclosure, both to help prevent it and to foster our ability to demonstrate that we made a diligent effort using generally accepted methods. Please post. Comment. Publish. Make your ideas heard.
Ed Fiducia said:
Craig,
As always, a pleasure to read your articles. Well said. I completely agree that one of the smartest and easiest ways to validate the priv log is to use the documents that have ALREADY been marked as priviliged as a basis for searching for potentially missed documents.
(Back on my soapbox again…) There is another smart, fast, inexpensive way to perform the QC. It doesn’t replace a linear search against the actual documents (great point differentiating documents from indices) but certainly augments it.
That process is Near Duplicate clustering. I’d bet that ANY of the near dupe (as opposed to Concept Search) engines…. Equivio, Syngence, Clustify and more…. would have hit on and flagged the additional copies that were missed. The reason is that because these technolgies “tokenize” data points rather than individual keywords, even if the documents contained odd words, or misspellings the overall scores for the potential duplicates would have been extremely high. So high, in fact, that the missed copies would probably been caught in the “first pass” review in the first place.
Ed Fiducia
Inventus
LikeLike
Pingback: Where Are The Attorneys? | IP Lateral
Katie Bosken said:
Great tip distinguishing between indexed versus linear search for QC. That’s one of those quirks of search that is so easily forgotten in the rush to finalize and get the production out the door.
LikeLike