DNA of DataOne of the conceits of writing is the perception that when you’ve written on something, it’s behind you.  Not that nothing else need be said on the topic, but only that it need not be said by you.  That’s silly for a host of reasons.  I started writing the print version of Ball in Your Court ten years ago–before the 2006 Federal Rules amendments and before the EDRM.  Half my readers weren’t in the field then, and veteran readers surely missed a few missives. Plus, if the point was worth making, perhaps it bears repeating. So, I now revisit columns and posts from the primordial past of e-discovery–starting over as it were, updating and critiquing in places, and hopefully restarting a few conversations. As always, your comments are gratefully solicited.

The DNA of Data

[2005: the very first Ball in Your Court]

Discovery of electronic data compilations has been part of American litigation for two generations, during which time we’ve seen nearly all forms of information migrate to the digital realm.  Statisticians posit that only five to seven percent of all information is “born” outside of a computer, and very little of the digitized information ever finds its way to paper.  Yet, despite the central role of electronic information in our lives, electronic data discovery (EDD) efforts are either overlooked altogether or pursued in such epic proportions that discovery dethrones the merits as the focal point of the case.  At each extreme, lawyers must bear some responsibility for the failure.  Few of us have devoted sufficient effort to learning the technology, instead deluding ourselves that we can serve our clients by continuing to focus on the smallest, stalest fraction of the evidence: paper documents.  When we do garner a little knowledge, we abuse it like the Sorcerer’s Apprentice, by demanding production of “any and all” electronic data and insisting on preservation efforts sustainable only through operational paralysis.  We didn’t know how good we had it when discovery meant only paper.

However, electronic evidence isn’t going away.  It’s growing…exponentially, and some electronic evidence items, like databases, spreadsheets, voice mail and video, bear increasingly less resemblance to paper documents.  Proposed changes in the rules of procedure wending their way through the system require lawyers to discuss ways to preserve electronic evidence, select formats in which to produce it and manage volumes of information dwarfing the Library of Congress.  Litigators must learn it or find a new line of work.  Here, I was referring to the 2006 amendments; but, with new proposed amendments in play, it still rings true.

My goal for this column is to help make electronic discovery and computer forensics a little easier to understand, never forgetting that this is exciting, challenging—and very cool—stuff.

Accessible versus Inaccessible
You can’t talk about EDD today without using the “Z” word: Zubulake (pronounced “zoo-boo-lake”).  Judge Shira Scheindlin’s opinions in Zubulake v. UBS Warburg, L.L.C., 217 F.R.D. 309 (S.D.N.Y. 2003) triggered a whirlwind of discussion about EDD.  Judge Scheindlin cited the “accessibility” of data as the threshold for determining issues of what must be produced and who must bear the cost of production.  Accessible data must be preserved, processed and produced at the producing party’s cost, while inaccessible data is available for good cause and may trigger cost shifting.

But what makes data “inaccessible?”  Is it a function of the effort and cost required to make sense of the data?  If so, do the boundaries shift with the skill and resources of the producing party such that ignorance is rewarded and knowledge penalized?  To understand when data is truly inaccessible requires a brief look at the DNA of data.

Everything’s Accessible
Computer data is simply a sequence of ones and zeroes. Data is only truly inaccessible when you can’t read the ones and zeroes or figure out where the sequence starts.  To better grasp this, imagine you had the unenviable responsibility of typing the complete works of Shakespeare on a machine with only two keys, “A” and “B,” and if you fail, all the great works of the Bard would be lost forever.  As you ponder this seemingly impossible task, you’d figure out that you could encode the alphabet using sequences of As and Bs to represent each of the twenty-six capital letters, their lower case counterparts, punctuation and spaces.  The uppercase “W” might be “ABABABBB” and the uppercase “S,” “ABABAABB.”  Cumbersome, but feasible.  Armed with the code and knowing where the sequence begins, a reader can painstakingly reconstruct every lovely foot of iambic pentameter.

This is just what a computer does when it stores data in ones and zeroes, except computers encode many “alphabets” and work with sequences billions of characters long.  Computer data is only “gone” when the media that stores it is obliterated, overwritten or strongly encrypted without a key.  This is true for all digital media, including backup tapes and hard drives.  But, inaccessibility due to damage, overwriting or encryption is rarely raised as grounds for limiting e-discovery or shifting costs.

Just Another Word for Burdensome?
Frequently, lawyers will couch a claim of undue burden in terms of inaccessibility, arguing that it’s too time-consuming or costly to restore the data.  But, burden and inaccessibility are opposite sides of the same coin, and “inaccessibility” adds nothing to the mix but confusion.  Arguing both burden and inaccessibility is two bites at the apple.

Worse, there is a risk in branding particular media as “inaccessible.”  Parties resisting discovery shouldn’t be relieved of the obligation to demonstrate undue burden simply because evidence resides on a backup tape.  We must be vigilant to avoid a reflexive calculus like:

All backup tapes are inaccessible
————-▼—————
Inaccessible means undue burden presumed
————-▼—————
Good cause showing required for production
————-▼—————
Requesting party pays cost of conversion to “accessible” form.

Zubulake put EDD on every litigator’s and corporate counsel’s radar screen and proved invaluable as a provocateur of long-overdue debate about electronic discovery.  Still, its accessibility analysis is not a helpful touchstone, especially in a fast-moving field like computing.  Codifying it in proposed amendments to F.R.C.P. Rule 26(b)(2) would perpetuate a flawed standard.  Even if that occurs, don’t be cowed by the label, “inaccessible,” and don’t shy away from seeking discovery of relevant media just because it’s cited as an example of something inaccessible.  Instead, require the producing party to either show that the ones and zeroes can’t be accessed or demonstrate that production entails an undue burden.

A decade later, credible claims of inaccessibility grounded on technical hurdles and balky backup media have largely disappeared, relegated to boilerplate objections.  My urging litigants to push back on inaccessibility claims seems quaint now; but in 2005, the prevailing view was that backups were ipso facto inaccessible.

Modern backups tend to be as easy to access and search as active data stores–easier in ways, because they consolidate information across custodians, facilitating collection from a single source and permitting search and deduplication across what would otherwise be separately siloed sources.  Moreover, software tools and vendor services make it possible to search, cull, filter, deduplicate and extract compressed information on multivolume tape backup sets without fully restoring same.  Finally, companies have gotten smarter in their refusal to retain vast legacy tape collections, and courts have awakened to the fact that tape media kept longer than a few weeks isn’t for disaster recovery; it’s an archive. 

Today, we  should expect claims of inaccessibility to reappear as we grapple with the smart phones and tablets that so captivate us–these sources are indeed harder and slower to preserve and process than PCs and servers and encryption renders some content functionally inaccessible.  As well, the burgeoning volume of data going to the Cloud may foster the epiphany that data in an online repository is lots easier to populate than to repatriate.