Is anyone else troubled that the most oft-cited research into e-discovery–the Blair & Marron study of keyword search–dates from 1985? Recent “studies” are often seat-of-the-pants opinion polls of the sort that ask in house counsel to guess how well prepared their companies are to deal with e-discovery or what they think discovery costs. These are interesting; but, they’re no more reliable than polls asking people to rate themselves as “fair minded” or “intelligent.” Polls measure people’s expectations about what facts might be, not facts. The long-held consensus that the sun circled a flat Earth didn’t make it so.
We need objective metrics in e-discovery, and one thing I’d like to see measured is the origin of the information obtained in discovery that’s actually used to prosecute or defend cases. My experience is that cases are won or lost using a handful of items versus the number exchanged in discovery. Do the exhibits used in motions, depositions and trials derive from e-discovery or do they emerge by other means?
I imagine such a study might look at, say, a dozen recent cases that went to verdict. These would be distributed across different types of lawsuits, perhaps four IP disputes, four personal injury matters and four business cases. Each item used as an exhibit to a motion, at an oral deposition or in trial would be traced back through discovery to determine how it was identified. Was it ESI? Was it found by electronic search using keywords? If so, which side proffered the keyword(s) that found it? Was it a scan of legacy paper records, elicited from a database, derived from e-mail, found by forensics or subpoenaed from third parties? Did its disclosure follow a motion to compel? Was it something the requesting party had before suit?
There is no consensus about the value of e-discovery versus its cost. We may learn that discovery poorly correlates to probative evidence, that it’s more valuable in certain types of cases than in others or that without e-discovery, litigants would be deprived of crucial evidence. Perhaps it will develop that it’s misleading to look at cases tried when the greatest value of discovery could flow from the cases it helps to settle. I have an opinion; but I don’t know. I’d like to know. Wouldn’t you?
Gerard Britton said:
Craig,
This is an interesting question but looking only at trial cases would give a very skewed view. Trials, like war, are often an indication of a failed process. Civil litigation trials occur for many reasons, but one factor pretty consistently present is the absence of definitive evidence on either side because if there were such evidence the outcome is fairly predictable. So, in cases where strong evidence is adduced, the red and blue opposing party settlement numbers converge and the case goes away. Therefore, I’d argue that the cases most likely to represent the importance of discovery findings would be invisible in the test you propose.
Moreover, I think you have some evidence from the criminal, Congressional and regulatory side, as well the “bet the farm” litigations, where factors other than strength of one’s side’s case often come into play, e.g. where the case is a “heater” and where the government is unwilling to cut a deal precisely because they have a strong case. I linked to an article that contains some examples of information that would be the product of diligent discovery, http://bit.ly/1tymQ3f; to which missives relating Morzilo, Bill Gates, Merck Vioxx, Goldman housing collapse and on and on and on can be added. In a post a while back, I termed these documents “Digital Documents in High Stakes Investigations: Like Mosquitoes In Amber “.
I’d argue that the REALLY interesting study, one that is not feasible, would to be to go through the entire corpora initially collected in a sample set of litigations, and see how many have un-produced documents that, using a reasonable person standard, would have fundamentally altered the outcome of the case. In this, I think we can take some instruction from the Zubulake cases about measuring the value of civil discovery by its output. Let us not forget Judge Scheindlin’s description of her initial impression of the Zubulake case as a “garden variety employment discrimination case” that absent Ms. Zubulake’s incredibly diligent efforts would have been logged in the type of study you posit as one that produced nothing of note through the discovery process.
So, maybe the question isn’t whether there is usually gold in them thar’ hills, but how infrequently current legal discovery produces it.
LikeLike
Chad Main said:
Craig–Earlier this year I saw an article, I think by a Microsoft lawyer, that had an analysis of the percentage of documents Microsoft produced that were actually used in pre-trial and trial activities. I just tried to find it and couldn’t. It was a pretty interesting. Obviously not a full blown study, but a study of Microsoft’s own data. If I find it again, I will post the link.
LikeLike
craigball said:
I suspect you’d find something like that associated with Microsoft’s lobbying activity respecting the latest proposed federal rules amendments wending their way through the process. Is it possible that what you saw was a representation of how much data Microsoft claims it puts on legal hold versus what’s actually produced or used?
LikeLike
Chad Main said:
In trying to find a link to the article, based on what I did find, it did appear that the research was connected to FRCP amendments. Unfortunately, I still can’t find the article. I am fairly certain that the piece I remember seeing specifically referred to the number of documents that actually ended up being used at trial–and, I think, also in pretrial activities. I cannot remember though if the number was based on information subject to litigation hold or actually produced. I want to say it was based on the number of documents produced, but I could be wrong.
LikeLike
Chad Main said:
Here is what I could find, but it is not the article I originally read. This references documents used as a percentage of those preserved:
“The company preserves an average of 1.3 million pages per custodian, but on average, only one page out of every 600,000 is used in litigation.”
Source: http://www.metrocorpcounsel.com/articles/27286/message-corporate-counsel-lawyers-civil-justice-lcj-please-express-your-support-propo
LikeLike
craigball said:
Note that the statistic cited speaks in terms of preservation, not production or even collection. Also, the number is highly suspect (pending further information that might make it more credible) because it’s couched in page volumes. Since data tends to be preserved as files and messages–measured in bytes–rather than as discrete pages, the values put forward suggest the use of a page equivalency. Page equivalencies are notoriously unreliable and almost always misrepresent (i.e., overstate) data volumes, sometimes markedly. Again, it’s a quote out of context, so I can’t judge it fairly without seeing the source and how it may have been supported or qualified. For now, I’d flag it as B.S. based on experience and motive.
LikeLike
Bill Dimm said:
Maybe you are thinking of page 16 of this paper (near the bottom of the page):
Click to access Regard-final.pdf
LikeLike
Chad Main said:
Bill–Yes! This is not the document I originally came across, but these numbers seem familiar to me.
LikeLike
realrecords said:
Mr. Ball,
A discovery vendor once told me that, in response to a similar question he posed to an attorney (basically asking how many documents were actually referred to or considered vital to a court action), the attorney replied ‘five documents’ were pivotal to the case. Of course, that doesn’t mean they did not rely on many other documents to support those five crucial documents. But it does give one a bit of pause in reflecting on the terms “relevant”, “non-relevant”, and especially “highly relevant”.
LikeLike
Mark Michels said:
I wonder whether we could find a way to get this on the research agenda of the FJC?
LikeLike
Greg Buckles said:
Good question Craig. I could not find any hard metrics either, so I created a short survey to see if my readers could give us some hard numbers. I will share the response with you after letting it run for a couple months. The survey is restricted to readers with a validated membership (free), which prevents padding or bias by providers with a market agenda.
Cheers and keep asking hard questions,
Greg Buckles – http://www.eDJGroupInc.com
LikeLike
craigball said:
Dear Greg:
Thank you for your interest and effort. Surveys and polls are fine ways to determine people’s expectations about facts; but, poor approaches to fact gathering. I doubt that even your skilled readership (or mine) can supply reliable metrics without exerting substantial effort that few, if any, can put forward. Plus, the respondents won’t measure using matching yardsticks. That’s why I advocate nothing less than an objective, carefully-designed study to give us answers we can lean on. I suppose there’s no harm to all these polls and surveys until people start citing them as proxies for fact–a common occurrence inside our little Sedona bubble.
LikeLike
Greg Buckles said:
Having attempted to launch more direct ‘Discovery Metrics’ studies when I was co-lead of the EDRM Metrics project, I can tell you how hard it is to get truly hard numbers across sufficient diversity of litigants. I will add this to my assessment metrics that we use for new corporate clients. At least that way I will have hard, if narrow, numbers from a few cutting edge corporations.
LikeLike
cjacobs001 said:
Reblogged this on e-Discovery Preparation.
LikeLike