In the wee hours last evening, I received a question posed by Angela Bunting with Nuix down in Sydney, Australia. Angela has such deep knowledge of e-discovery above and below the Equator that I was flattered to be queried by someone I’d go to for guidance. It was a magnificent hypothetical question.
Angela posited a scenario where a producing party used emerging technolgies to either mechanically translate foreign language text to English or voice recordings to text. In each instance, the quality of the resultant searchable text was poor, akin to bad OCR, and characterized by poor searchability due to malformed and missing words, misleading substitutions, etc. As a consequence of this poor searchability, some documents that should have been produced were not and, to make matters worse, the requesting party had some of the omitted documents, so could readily demonstrate serious flaws in production.
Challenged by the requesting party, the producing party defends the use of the automated transcription or translation based on proportionality. To do the same work any other way would have required use of costly and time-consuming manual labor.
So, there you have it: the automated approach was faster and cheaper, but also much less accurate and complete, resulting in a failure to produce non-privileged responsive material.
Angela asked what I believed the view of the courts might be in such a situation? Would the Court require the work be done again using a more accurate, more expensive method? Might sanctions issue? Would the Court excuse the failure based on proportionality?
Predicting what courts will do based on skeletal hypotheticals is a crap shoot. Outcomes turn on the peculiar facts of each case and, when the issue is e-discovery, on counsels’ skill in acquainting the judge with the technical underpinnings.
But, I gave it a shot, and here’s my reply:
The outcome seems to depend on the status quo and how significant a departure from same has occurred. How was this data being handled in discovery before the introduction of the automated transcription?
If the prior approach was demonstrably more reliable but entailed use of costly and slow human transcriptionists and translators, the proponent of automated transcription has a problem. We expect new technologies to be palatable substitutes for the older, labor-intensive methodologies they replace. If the old and new approaches are comparable in outcome, the “new” crowds out the “old” because it’s faster and cheaper. But, if the outcomes aren’t comparable and the less-costly automated approach delivers demonstrably inferior outcomes, courts will be challenged to accept a departure from the status quo ante. Courts properly reject savings realized at the cost of materially prejudicing the interests of the opposing party.
A savvy opponent will hammer home that anyone can save money by embracing shoddier work. Accordingly,. the technology must be defensible on grounds other than the sad-but-true argument that, “it’s not nearly as good, but it’s cheaper.”
It remains uncertain how American jurists will weave proportionality analyses into their decision making. It may depend upon how the allegedly “more proportionate” approach is presented. Judges are amenable–even eager–to reduce the scope of discovery on proportionality grounds. Judges often do justice by splitting the baby. They freely issue protective orders limiting sources, custodians and queries. Cutting back on scope is an easy way to be Solomon-like. But, the implicit expectation of the Court is that, although diminished in scope, the production will not be diminished in quality. That is, less done, but still done right.
In your hypothetical, the loss of quality (manifested as the failure to produce patently responsive material) is readily demonstrated by the requesting party. I’m unsure how a court will react to cost saving methodologies promoting inferior processes applied to great (or greater) swaths of data. Unfavorably, I imagine. Unfavorably, I hope.
You ask whether a Court might order that the work and production be repeated using the high-cost manual alternative. I expect a court would do so–unless the proponent of the new technology can put forward an acceptable alternative that fairly balances the benefits (lower cost/faster turnaround) against the diminished quality of the production. For example, a party seeking to reduce costs might propose to have only select key custodians’ data corrected by a professional translator or transcriptionist. That’s just one of several hybrid soutions I see that would enable counsel to use the cheaper technology to achieve significant savings, even if unable to use the cheaper method across the board.
As to whether a U.S. court would impose sanctions in your scenario, I think not. But, it will be important to show that the automated transcription was not employed to impede discovery by intentionally depriving the other side of responsive data. Ideally, it could be shown that the producing party was as handicapped as the requesting party by the quality issues. Courts endeavor to protect the goal of a level playing field. The worst case would be the producing party who uses better approaches when they need reliable data but elects to use the “cheap” automated method for supplying sub par data to the other side. If there’s no parity, it’s easier to claim that the actions manifest an intent to deprive per U.S. FRCP Rule 37(e).
Considering the above, it will be crucial to disclose inherent limitations of the automated transcription technolgy and educate counsel and the Court respecting those inherent shortcomings. As its point of reference, the Court should be briefed to consider automated transcription as a superior alternative to the defensible alternative of declining to process the data at all (on grounds that it’s not reasonably accessible as a consequence of undue burden or cost).
Horse trading with an opponent as a means to get agreement to use the technology would be smart. Coming to court with sound metrics is also key. The proponent of the technology should be conversant in its error rate (i.e., just how bad is the transcription or translation?) and prepared to quantify the savings realized. Give the Court evidence to balance, not just advocacy.
That was my take, Dear Reader. What’s yours?
Frank Daddario said:
do they have professional jurors in australia ?
LikeLike
craigball said:
Not insofar as I am aware. But, they have kangaroos and koalas, which is basically the same thing.
LikeLike
ESIDence said:
“Emerging technologies” which position multiple iterations of the process employed, would provide a potential “middle ground”: e.g., update any case-specific selection and production criteria (to address the poor results) and review, repeat, etc. until the results meet quality standards. Use of iterations would seem to provide a much less expensive and/or onerous option than an immediate move to a manual approach.
I’m surmising that the technolog(ies) employed in this context don’t offer such iterative capabilities: if true, it may still be useful to consider a complete “re-do” with the same or different technolog(ies) and better customization (specific to the case and its ESI) as a lower cost (and more expeditious) alternative to a manual process.
LikeLike
Pingback: Proportionality and Emerging Technologies - @ComplexD | @ComplexD
Pingback: Happy E-Discovery Day! | Ball in your Court
SLPerry said:
Would be curious why the search was done from translated text rather than original?? Other than glyph based alphabets I think (from your teachings) that latent semantic indexing works across all text based alphabets. Usually get my hand slapped for this kind of presumption, not fragile:)
LikeLike
craigball said:
Being that I was dealing with a hypothetical problem posed to me, the most accurate answer I can offer is, “because that’s the way the facts were presented.” You’re right that it would be considerably preferable for foreign language content to be searched by fluent native speakers using the native language to forge queries; but, that wasn’t an option in the problem presented and, in truth, it’s not always an option in the trenches.
Adapting to cost and scheduling challenges may require the use of skilled non-native speakers working with manual or machine translations to the language native to the reviewer. Using TAR and latent semantic indexing still entails the contribution of persons who can read and understand the content. Though the n-gram parsing of the content is not language-specific (though uniquely challenging with languages written with pictograms), someone still needs to train the system with knowledge of the content AND the of the issues in order to generate seed sets and refine the discrimination between potentially-responsive material. So, you either need native speakers competent in the legal issues as reviewers (often a non-starter) or you need the content to be faithfully converted to the language understood by the non-native speaking reviewers. Pick your poison, as each has a downside in terms of cost or accuracy. Good point! Thanks.
LikeLike
Sandy Serkes said:
Craig, I believe you are overlooking a key aspect of this dilemma, particularly regarding the audio content. You have assumed that the audio is intelligible to a human listener in the first place. Almost certainly, if the speech-to-text algorithm is producing poor results, it is because the original audio content is faulty. It might be a low resolution recording, there may be multiple people speaking at a time, or the recording occurred in a noisy environment. All of these scenarios (and many others) would render the audio difficult to decipher for anyone, man or machine. Thus the notion that manual listening and transcription would produce a better quality (though certainly costlier) “search basis” for production is not well founded. The simple solution is to try the comparison on a representative set (stratified sampling) of documents and compare results. If the manual transcription is markedly superior to the automated transcription, then there is strong merit to quality trumping cost savings. However, if they come out more or less the same (which is my strong supposition), then cost-savings should win out as proportionally similar quality results.
LikeLike