broken searchRecently, I wrote on the monstrous cost of TIFF+ productions compared to the same data produced as native files.  I’ve wasted years trying to expose the loss of utility and completeness caused by converting evidence to static formats.  I should have recognized that no one cares about quality in e-discovery; they only care about cost.  But I cannot let go of quality because one thing the Federal Rules make clear is that producing parties are not permitted to employ forms of production that significantly impair the searchability of electronically stored information (ESI).

In the “ordinary course of business,” none but litigators “ordinarily maintain” TIFF images as substitutes for native evidence   When requesting parties seek production in native forms, responding parties counter with costly static image formats by claiming they are “reasonably usable” alternatives.  However, the drafters of the 2006 Rules amendments were explicit in their prohibition:

[T]he option to produce in a reasonably usable form does not mean that a responding party is free to convert electronically stored information from the form in which it is ordinarily maintained to a different form that makes it more difficult or burdensome for the requesting party to use the information efficiently in the litigation. If the responding party ordinarily maintains the information it is producing in a way that makes it searchable by electronic means, the information should not be produced in a form that removes or significantly degrades this feature.

 FRCP Rule 34, Committee Notes on Rules – 2006 Amendment.

I contend that substituting a form that costs many times more to load and host counts as making the production more difficult and burdensome to use.  But what is little realized or acknowledged is the havoc that so-called TIFF+ productions wreck on searchability, too.  It boggles the mind, but when I share what I’m about to relate below to opposing counsel, they immediately retort, “that’s not true.”  They deny the reality without checking its truth, without caring whether what they assert has a basis in fact.  And I’m talking about lawyers claiming deep expertise in e-discovery.  It’s disheartening, to say the least.

A little background: We all know that ESI is inherently electronically searchable.  There are quibbles to that statement but please take it at face value for now.  When parties convert evidence in native forms to static image forms like TIFF, the process strips away all electronic searchability.  A monochrome screenshot replaces the source evidence.  Since the Rules say you can’t remove or significantly degrade searchability, the responding party must act to restore a measure of searchability.  They do this by extracting text from the native ESI and delivering it in a “load file” accompanying the page images.  This is part of the “plus” when people speak of TIFF+ productions.

E-discovery vendors then seek to pair the page images with the extracted text in a manner that allows some text searchability.  Vendors index the extracted text to speed search, a mapping process intended to display the page where the text was located when mapped.  This is important because where the text appears in the load file dictates what page will be displayed when the text is searched and determines whether features like proximity search and even predictive coding work as well as we have a right to expect.  Upshot: The location and juxtaposition of extracted text in the load file matters significantly in terms of accurate searchability.  If you don’t accept that, you can stop reading.

Now, let’s consider the structure of modern electronic evidence.  We could talk about formulae in spreadsheets or speaker notes in presentations, but those are not what we fight over when it comes to forms of production. Instead,  I want to focus on Microsoft Word documents and those components of Word documents called Comments and Tracked Changes; particularly Comments because these aren’t “metadata” by any stretch.  Comments are user-contributed content, typically communications between collaborators.  Users see this content on demand and it’s highly contextual and positional because it is nearly always a comment on adjacent body text.  It’s NOT the body text, and it’s not much use when it’s separated from the body text.  Accordingly, Word displays comments as marginalia, giving it the power of place but not enmeshing it with the body text.

But what happens to these contextual comments when you extract the text of a Word document to a load file and then index the load files?

There are three ways I’ve seen vendors handle comments and all three significantly degrade searchability:

First, they suppress comments altogether and do not capture the text in the load files.  This is content deletion.  It’s like the content was never there and you can’t find the text using any method of electronic search.  Responding parties don’t disclose this deletion nor is it grounded on any claim of privilege or right.  Spoliation is just S.O.P.

Second, they merge the comments into the adjacent body text. This has the advantage of putting the text more-or-less on the same page where it appears in the source, but it also serves to frustrate proximity search and analytics.  The injection of the comment text between a word combination or phrase causes searches for that word combo or phrase to fail.  For example, if your search was for ignition w/3 switch and a four-word comment comes between “ignition” and “switch,” the search fails.

Third, and frequently, vendors aggregate comments and dump them at the end of the load file with no clue as to the page or text they reference.  No links.  No pointers.  Every search hitting on comment text takes you to the wrong page, devoid of context.

Some of what I describe are challenges inherent to dealing with three-dimensional data using two-dimensional tools.  Native applications deal with Comments, speaker notes and formulae three-dimensionally.  We can reveal that data as needed, and it appears in exactly the way witnesses use it outside of litigation.  But flattening native forms to static images and load files destroys that multidimensional capability.   Vendors do what they can to add back functionality; but we should not pretend the results are anything more than a pale shadow of what’s possible when native forms are produced.  I’d call it a tradeoff, but that implies requesting parties know what’s being denied them.  How can requesting party’s counsel know what’s happening when responding parties’ counsel haven’t a clue what their tools do, yet misrepresent the result?

But now you know.  Check it out.  Look at the extracted text files produced to accompany documents with comments and tracked changes.  Ask questions.  Push back.  And if you’re producing party’s counsel, fess up to the evidence vandalism you do.  Defend it if you must but stop denying it.  You’re better than that.