
I read a couple of good articles on the e-discovery implications of the Mueller report and tweeted,
“The Mueller report underscores why image+ productions are ridiculous. Compare the OCR to the true text. It’s a mess, so search is off. Image files many times larger than the native, ergo much more costly to load, store, host, transmit. BTW: YES, you CAN redact a Word file. It’s XML!”
This bears fleshing out, and I want to do it by sharing a simple trick enabling you to peer inside the raw guts of a Microsoft Word file and understand why native redaction isn’t the pipe dream some try to make it. But first, let’s unpack the jargon.
“Image+” or “TIFF+” productions refer to the common practice of fixing the content of a document by printing the file to a static image format like TIFF or PDF. I use “fixing” in the sense of making something permanent, but it’s also accurate to use it the way we speak of “fixing” a cat; that is, cutting its balls off.
The “plus” in TIFF+ refers to the need to supply the native file’s searchable text and application metadata in ancillary load files to accompany the page images. That is, rather than supply the evidence, producing parties degrade it to a deconstructed “kit” version of the evidence that requesting parties must load into review platforms to restore a crude level of searchability. This enables producing parties to suppress content (like embedded comments, speaker notes and changes in text documents) and much of the application metadata of the original. It also neuters the evidence. It’s no longer functional in the programs that created it, like Word, PowerPoint or Excel.
I’ve written extensively about this elsewhere (e.g., Lawyers’ Guide to Forms of Production ), and I try to present the pros and cons of TIFF+, notwithstanding my belief that the cons decidedly outweigh the pros. It largely comes down to Bates numbers and disagreement about how and when those fetishistic Bates identifiers should be added to evidence and at what absurd cost.
TIFF+ enables producing parties to sidestep their obligation to review unprinted information for responsiveness and privilege. Instead, they silently make that content disappear like a “fixed” cat’s testicles. To be fair, most lawyers know so little about ESI processing that they are blissfully unaware it’s happening, so they deny it with genuine equanimity. When you force them to acknowledge the spoliation, they fall back on claiming that, whatever they excised and didn’t review wouldn’t have been worth the trouble of reviewing or producing. Genius, right?
Apart from what’s missing from the dumbed-down data, the big objection I offer to TIFF over native productions is the huge size difference between them. TIFF productions are much, much fatter. Though information and utility has been stripped from the images, the degraded set is nonetheless many times larger (measured in bytes) than the native originals.
Because most e-discovery service providers price their wares by the gigabyte volume going into, onto and out of their systems, bigger files mean bigger bills. Much bigger files mean…well, you get it.
Perhaps you’re thinking, “Craig, you sad, sad Cassandra; how much bigger can these image sets be than their native counterparts?” Would ten times bigger surprise you? Well, then surprise! But, they’re usually more than ten times larger. It’s not a one-off rip-off either. Most hosted platforms charge you for the fatter file volume every month. Over, and over, and over again.
Sucker. Continue reading →