The title of this post is the question posed by a plaintiffs’ lawyer who called because he didn’t know what to make of a proposal from opposing counsel. The lawyer explained that he’d attended a Rule 26(f) “Meet ‘n Confer” where he’d tried to manifest the right grunts and signs to convey that he wanted electronically-searchable production. As neither of the lawyers conferring knew how they might achieve such a miracle, they shared a deer-in-headlights moment, followed by the usual “let me ask my client and get back to you” feint. Some years back, I defined a Rule 26(f) conference as “Two lawyers who don’t trust each other negotiating matters neither understand.” That definition seems to have withstood the test of time.
Before my high-handed cynicism turns you off completely, let me explain that I appreciate that many fine lawyers didn’t grow up with this “computer stuff.” They earned their stripes with paper and, like me, leapt to law from the liberal arts. They’re crazy busy with the constant demands of a trial practice, and ESI is just not a topic that excites their interest. Some are still recovering from the last time they tried to pick up pointers from a tech-savvy person and nearly drowned in a sea of acronyms and geek speak.
I feel your pain. I do. Now, let’s ease that pain:
The other side proposed:
Documents will be produced as single page TIFF files with multi-page extracted text or OCR. We will furnish delimited IPRO or Opticon load files and will later identify fielded information we plan to exchange.
Are they trying to screw you? Probably not.
Are you screwing yourself by accepting the proposed form of production? Yes, probably.
First, let’s translate what they said to plain English.
“Documents will be produced as single page TIFF files….”
They are not offering you the evidence in anything like the form in which they created and used the evidence. Instead, they propose to print everything to a kind of electronic paper, turning searchable, metadata-rich evidence into non-searchable pictures of much (but not all) of the source document. These pictures are called TIFFs, an acronym for Tagged Image File Format. “Single page TIFF” means that each page of a document will occupy its own TIFF image, so reading the document will require loading and reviewing multiple images (as compared to, e.g., a PDF document where the custom is for the entire document to be contained within one multipage image).
If you ever pithed a frog in high school biology, you know what it’s like to TIFF a native document. Converting a native document to TIFF images is lobotomizing the document. By “native,” I mean that the file that contains the document is in the same electronic format as when used by the software application that created the file. For example, the native form of Microsoft Word document is typically a file with the extension .DOC or .DOCX. For a Microsoft Excel spreadsheet, it’s a file with the extension .XLS or .XLSX. For PowerPoints, the file extensions are .PPT or .PPTX. Native file formats contain the full complement of content and application metadata available to those who created and used the document. Unlike TIFF images, native files are functional files, in that they can be loaded into a copy of the software application that created them to replicate what a prior user saw, as well as affording a comparable ability to manipulate the data and access content that’s made inaccessible when presented in non-native formats.
Think of a TIFF as a PDF’s retarded little brother. I mean no offense by that, but TIFFs are not just differently abled; they are severely handicapped. Not born that way, but lamed and maimed on purpose. The other side downgrades what they give you, making it harder to use and stripping it of potentially-probative content.
Do they do this because they are trying to screw you? Probably not.
Does it screw you just the same? Well, yeah.
“[W]ith multi-page extracted text or OCR.”
A native file isn’t just a picture of the evidence. It’s the original electronic evidence. As such, it contains all of the content of the document in an electronic form. Because it’s designed to be electronically usable, it tends to be inherently electronically searchable; that is, whatever data it holds is encoded into the native electronic file, including certain data about the data, called application metadata. When an electronic document is converted to an image—TIFF—it loses its ability to be searched electronically and its application metadata and utility is lost. It’s like photographing a steak. You can see it, but you can’t smell, taste or touch it; you can’t hear the sizzle, and you surely can’t eat it.
Because converting to TIFF takes so much away, parties producing TIFF images deploy cumbersome techniques to restore some of the lost functionality and metadata. To restore a measure of electronic searchability, they extract text from the electronic document and supply it in a file accompanying the TIFF images. It’s called “multi-page extracted text” because, although the single-page TIFFs capture an image of each page, the text extraction spans all of the pages in the document. A recipient runs searches against the extracted text file and then seeks to correlate the hits in the text to the corresponding page image.
If the source documents are scans of paper document, there’s no electronic text to extract from the paper. Instead, the scans are subjected to a process called optical character recognition (OCR) that serves to pair the images of letters with their electronic counterparts and impart a rough approximation of searchability. OCR sucks, but it beats the alternative (no electronic searchability whatsoever).
“We will furnish delimited IPRO or Opticon load files….”
Whether extracted from an electronic source or cobbled together by OCR, the text corresponding to the images or scans is transferred in so-called “load files” that may also contain metadata about the source documents. Collectively, the load file(s) and document images are correlated in a database tool called a “review platform” or “review tool” that facilitates searching the text and viewing the corresponding image. Common review tools include Concordance, Summation and Relativity. There are many review tools out there, some you load on your own machines (‘behind the firewall“) and some you access via the Internet as hosted tools.
To insure that the images properly match up with extracted text and metadata, the data in the load files is “delimited,” meaning that each item of information corresponding to each page image is furnished in a sequence separated by delimiters–just a fancy word for characters like commas, tabs or semicolons used to separate each item in the sequence. The delimiting scheme employed in the load files can follow any of several published standards for load file layout, including the most common schemes known as IPRO or Opticon.
“[A]nd will later identify fielded information we plan to exchange.”
Much of the information in electronic records is fielded, meaning that is not lumped together with all the other parts of the record but is afforded its own place or space. When we fill out paper forms that include separate blanks for our first and last name, we are dividing data (our name) into fields: (first), (last). A wide array of information in and around electronic files tends to be stored as fields, e.g., e-mail messages separately field information like From, To, Date and Subject. If fielded information is not exchanged in discovery as fielded information, you lose the ability to filter information by, for example, Date or Sender in the case of an e-mail message or by a host of properties and metadata describing other forms of electronically stored information.
Additionally, the discovery process may necessitate the linking of various fields of information with electronic documents, such as Bates numbers, hash values, document file paths, extracted text or associated TIFF image numbers. There may be hundreds of fields of metadata and other data from which to select, though not all of it has any evidentiary significance or practical utility. Accordingly, the proposal to “later identify fielded information we plan to exchange” defers the identification of fielded information to later in the discovery process when presumably the parties will have a better idea what types of ESI are implicated and what complement of fields will prove useful or relevant.
Are they trying to screw you by not identifying fielded information?
No. They’re just buying time
Does their delay screw you?
Maybe. Re-collecting fielded information you didn’t expect your opponent would ask for can be burdensome and costly. Waiting too long to seek fielded information from an opponent may prompt the opponent to refuse to belatedly collect and produce it.
So, are they trying to screw you by this proposal? I doubt it. Chances are they are giving you the dumbed down data because that’s what they always give the other side, most of whom accept it neither knowing nor caring what they’re missing. It may be the form of production their own lawyers prefer because their lawyers are reluctant to invest in modern review tools. It probably doesn’t hurt that the old ways take longer and throw off more billable hours.
You may accept the screwed up proposal because, even if the data is less useful and incomplete, you won’t have to evolve. You’ll pull the TIFF images into your browser and painstakingly read them one-by-one, just like good ol’ paper; all-the-while telling yourself that what you didn’t get probably wasn’t that important and promising yourself that next time, you’ll hold out for the good stuff—the native stuff. Yeah, next time for sure. Definitely. Definitely.
(Octber 9, 2012) P.S. to some who have added comments: Thanks for the contribution, but readers should be careful not to confuse the production of ESI in native forms with the use of native applications to open and review the data. You don’t use native apps to search and review native productions any more than you use a screwdriver as a hammer. Instead, you use review tools tailored to the task. While we’re at it, you shouldn’t let the redaction tail wag the production dog. Go ahead and use TIFFs and OCR for redaction if you wish; but, don’t screw up the entire production because you want to use TIFFs to redact a handful of documents! As far as using documents in proceedings, go ahead and print out the few you’ll use; but here again, don’t get screwed by a TIFF production just so you can print something out. Last I checked, native documents printed out very nicely, too.
What a great way to put it. Love Craig Ball articles.Easy to read, understand and to the point! Now, how about you attorneys hire people like me who know what heck a load file is and what forms of ESI are 😉
David Tobin said:
it’s all about context, many times TIFF’s and OCR’s are enough
I agree that not working out standard meta-data exchange requirements ahead of time is a bit ridiculous. Excluding the rare exceptions, there are meta-data fields that are commonly used and most practical. Tiffs accompanied by extracted text\OCR and load files will always be the standard form of exchange until-
1. Courts can effectively accept a 100 % native production
2. Courts and both parties have all of the various software applications required to view the native version of a particular file (vairous CAD, accounting software, countless email clients, Mac, Win PC, Linux, etc.)
3. Encrypted or password protected files made searchable without altering and re-saving the original file
4. A person opening a native file cannot accidentally alter a file when opened natively (some review platforms do support read-only access)
5. You can bates number and apply confidentiality to every page of a native file without altering the original native
6. Redact native files effectively then save the redacted file without altering original file meta-data
7. Attorney’s no longer need printed copies of documents for deposition, exhibits, or just because they prefer reviewing in paper format
Certain file types such as Excel should always be included in native format along with a tiff production, unless redactions are required.
Chad Kime, Ji2 said:
We deal with Asian data that often contains non-Unicode characters so the native files will appear corrupted/garbled on most review platforms. We recommend PDF files as a compromise (except for Microsoft Excel which will have insane page counts)…
Pingback: Are They Trying to Screw Me? | ediscoverywest
Lael Andara said:
“Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius — and a lot of courage to move in the opposite direction.”
― E.F. Schumacher
CaseLines Paul Sachs (@CaseLines) said:
Nicely put. Lawyers need technology that is immediately intuitive and ready to use. Too often we see eDiscovery tools that require lawyers to use and become IT experts. The result is that technology that could be cost-reducing is rejected. The new range of eBundling tools are moving into the ‘I can use that’ space for lawyers.
Pingback: The Many Faces of Mike McBride » Blog Archive » This Week’s Links (weekly)
Pingback: This Week’s Links (weekly) | Webmasters' Home
Brilliant. I know you’ve been saying this for many years, but it bears repeating until reaching a broad audience that is ready to learn. As for using natives in depositions and court, this can be done quite easily if everyone is on the same page of the play. For the luddites, be sure the discovery protocol includes the obvious: “any party seeking to convert native formatted documents to images or printouts, for use as exhibits or otherwise, must insure each image or printed page is branded or stamped with 1) a Bates number derived from the production document identification number, 2) the hash signature of the native formatted document and 3) any designation of confidentiality or restriction for use for a particular purpose, clearly and legibly on the image or printout. This solves all the downstream problems that may arise with an ill-conceived native format production, as well as shifts the cost of conversion to the party seeking to ‘dumb down’ the document.
On the flip side, native-format production works well with WESI (Windows-only ESI) in a Windows world. As the variability of operating systems and file types expands exponentially, the need for a standard, ubiquitous, self-authenticating digital evidence file-type will become self-evident and perhaps be provided. Said with appropriate southern drawl, PDF ain’t it.
Craig Ball said:
Thank you, Bill. Much appreciated, especially coming from one so steeped in the technology in his own right. Your approach to printing when needed in proceedings is spot on. I published an exemplar protocol that called for much the same thing as an appendix to my Lawyers Guide to Metadata. You said it better.
I share your desire for an authenticable, universal, extensible production format. One small quibble about PDF. I agree that “PDF ain’t it” as currently deployed; but, PDF has the promise to be it by virtue of the little-recognized capability to embed a binary file in a PDF. Use of the embeddment feature would enable a litigant to furnish a document image and text layer benefitting those without modern e-discovery tools and skills, as well as supply the native content for evolved users, all in the same adaptable wrapper. It’s not self-authenticable, but it could be digitally signed and accomplish same.
Coincidentally, this happens to be the way that MIME e-mail developed to support old and new tools, carrying both a plain text mesage body and an alternative richer HTML counterpart message body. E-mail seems to have been reasonably successful. At least, I know a few people who use it. How about you? 😉 Regards to you and Mrs. Kellerman. Craig
Pingback: Eight Tips to Quash the Cost of eDiscovery – eDiscovery Best Practices | eDiscoveryDaily
Pingback: Are You Requesting the Best Production Format for Your Case? – eDiscovery Best Practices | eDiscoveryDaily
Pingback: Are They Trying to Screw Me Document 51-7 | mdl-2545-testosterone-replacement-litigation