The title of this post is the question posed by a plaintiffs’ lawyer who called because he didn’t know what to make of a proposal from opposing counsel. The lawyer explained that he’d attended a Rule 26(f) “Meet ‘n Confer” where he’d tried to manifest the right grunts and signs to convey that he wanted electronically-searchable production. As neither of the lawyers conferring knew how they might achieve such a miracle, they shared a deer-in-headlights moment, followed by the usual “let me ask my client and get back to you” feint. Some years back, I defined a Rule 26(f) conference as “Two lawyers who don’t trust each other negotiating matters neither understand.” That definition seems to have withstood the test of time.
Before my high-handed cynicism turns you off completely, let me explain that I appreciate that many fine lawyers didn’t grow up with this “computer stuff.” They earned their stripes with paper and, like me, leapt to law from the liberal arts. They’re crazy busy with the constant demands of a trial practice, and ESI is just not a topic that excites their interest. Some are still recovering from the last time they tried to pick up pointers from a tech-savvy person and nearly drowned in a sea of acronyms and geek speak.
I feel your pain. I do. Now, let’s ease that pain:
The other side proposed:
Documents will be produced as single page TIFF files with multi-page extracted text or OCR. We will furnish delimited IPRO or Opticon load files and will later identify fielded information we plan to exchange.
Are they trying to screw you? Probably not.
Are you screwing yourself by accepting the proposed form of production? Yes, probably.
First, let’s translate what they said to plain English.
“Documents will be produced as single page TIFF files….”
They are not offering you the evidence in anything like the form in which they created and used the evidence. Instead, they propose to print everything to a kind of electronic paper, turning searchable, metadata-rich evidence into non-searchable pictures of much (but not all) of the source document. These pictures are called TIFFs, an acronym for Tagged Image File Format. “Single page TIFF” means that each page of a document will occupy its own TIFF image, so reading the document will require loading and reviewing multiple images (as compared to, e.g., a PDF document where the custom is for the entire document to be contained within one multipage image).
If you ever pithed a frog in high school biology, you know what it’s like to TIFF a native document. Converting a native document to TIFF images is lobotomizing the document. By “native,” I mean that the file that contains the document is in the same electronic format as when used by the software application that created the file. For example, the native form of Microsoft Word document is typically a file with the extension .DOC or .DOCX. For a Microsoft Excel spreadsheet, it’s a file with the extension .XLS or .XLSX. For PowerPoints, the file extensions are .PPT or .PPTX. Native file formats contain the full complement of content and application metadata available to those who created and used the document. Unlike TIFF images, native files are functional files, in that they can be loaded into a copy of the software application that created them to replicate what a prior user saw, as well as affording a comparable ability to manipulate the data and access content that’s made inaccessible when presented in non-native formats.
Think of a TIFF as a PDF’s retarded little brother. I mean no offense by that, but TIFFs are not just differently abled; they are severely handicapped. Not born that way, but lamed and maimed on purpose. The other side downgrades what they give you, making it harder to use and stripping it of potentially-probative content.
Do they do this because they are trying to screw you? Probably not.
Does it screw you just the same? Well, yeah.
“[W]ith multi-page extracted text or OCR.”
A native file isn’t just a picture of the evidence. It’s the original electronic evidence. As such, it contains all of the content of the document in an electronic form. Because it’s designed to be electronically usable, it tends to be inherently electronically searchable; that is, whatever data it holds is encoded into the native electronic file, including certain data about the data, called application metadata. When an electronic document is converted to an image—TIFF—it loses its ability to be searched electronically and its application metadata and utility is lost. It’s like photographing a steak. You can see it, but you can’t smell, taste or touch it; you can’t hear the sizzle, and you surely can’t eat it.
Because converting to TIFF takes so much away, parties producing TIFF images deploy cumbersome techniques to restore some of the lost functionality and metadata. To restore a measure of electronic searchability, they extract text from the electronic document and supply it in a file accompanying the TIFF images. It’s called “multi-page extracted text” because, although the single-page TIFFs capture an image of each page, the text extraction spans all of the pages in the document. A recipient runs searches against the extracted text file and then seeks to correlate the hits in the text to the corresponding page image.
If the source documents are scans of paper document, there’s no electronic text to extract from the paper. Instead, the scans are subjected to a process called optical character recognition (OCR) that serves to pair the images of letters with their electronic counterparts and impart a rough approximation of searchability. OCR sucks, but it beats the alternative (no electronic searchability whatsoever).
“We will furnish delimited IPRO or Opticon load files….”
Whether extracted from an electronic source or cobbled together by OCR, the text corresponding to the images or scans is transferred in so-called “load files” that may also contain metadata about the source documents. Collectively, the load file(s) and document images are correlated in a database tool called a “review platform” or “review tool” that facilitates searching the text and viewing the corresponding image. Common review tools include Concordance, Summation and Relativity. There are many review tools out there, some you load on your own machines (‘behind the firewall“) and some you access via the Internet as hosted tools.
To insure that the images properly match up with extracted text and metadata, the data in the load files is “delimited,” meaning that each item of information corresponding to each page image is furnished in a sequence separated by delimiters–just a fancy word for characters like commas, tabs or semicolons used to separate each item in the sequence. The delimiting scheme employed in the load files can follow any of several published standards for load file layout, including the most common schemes known as IPRO or Opticon.
“[A]nd will later identify fielded information we plan to exchange.”
Much of the information in electronic records is fielded, meaning that is not lumped together with all the other parts of the record but is afforded its own place or space. When we fill out paper forms that include separate blanks for our first and last name, we are dividing data (our name) into fields: (first), (last). A wide array of information in and around electronic files tends to be stored as fields, e.g., e-mail messages separately field information like From, To, Date and Subject. If fielded information is not exchanged in discovery as fielded information, you lose the ability to filter information by, for example, Date or Sender in the case of an e-mail message or by a host of properties and metadata describing other forms of electronically stored information.
Additionally, the discovery process may necessitate the linking of various fields of information with electronic documents, such as Bates numbers, hash values, document file paths, extracted text or associated TIFF image numbers. There may be hundreds of fields of metadata and other data from which to select, though not all of it has any evidentiary significance or practical utility. Accordingly, the proposal to “later identify fielded information we plan to exchange” defers the identification of fielded information to later in the discovery process when presumably the parties will have a better idea what types of ESI are implicated and what complement of fields will prove useful or relevant.
Are they trying to screw you by not identifying fielded information?
No. They’re just buying time
Does their delay screw you?
Maybe. Re-collecting fielded information you didn’t expect your opponent would ask for can be burdensome and costly. Waiting too long to seek fielded information from an opponent may prompt the opponent to refuse to belatedly collect and produce it.
So, are they trying to screw you by this proposal? I doubt it. Chances are they are giving you the dumbed down data because that’s what they always give the other side, most of whom accept it neither knowing nor caring what they’re missing. It may be the form of production their own lawyers prefer because their lawyers are reluctant to invest in modern review tools. It probably doesn’t hurt that the old ways take longer and throw off more billable hours.
You may accept the screwed up proposal because, even if the data is less useful and incomplete, you won’t have to evolve. You’ll pull the TIFF images into your browser and painstakingly read them one-by-one, just like good ol’ paper; all-the-while telling yourself that what you didn’t get probably wasn’t that important and promising yourself that next time, you’ll hold out for the good stuff—the native stuff. Yeah, next time for sure. Definitely. Definitely.
(Octber 9, 2012) P.S. to some who have added comments: Thanks for the contribution, but readers should be careful not to confuse the production of ESI in native forms with the use of native applications to open and review the data. You don’t use native apps to search and review native productions any more than you use a screwdriver as a hammer. Instead, you use review tools tailored to the task. While we’re at it, you shouldn’t let the redaction tail wag the production dog. Go ahead and use TIFFs and OCR for redaction if you wish; but, don’t screw up the entire production because you want to use TIFFs to redact a handful of documents! As far as using documents in proceedings, go ahead and print out the few you’ll use; but here again, don’t get screwed by a TIFF production just so you can print something out. Last I checked, native documents printed out very nicely, too.