Be honest. Wouldn’t you love to stick it to the plaintiffs? Wouldn’t your corporate client or carrier be ecstatic if you could make litigation much more expensive for those greedy opportunists bringing frivolous suits and demanding discovery? What if you could make discovery not just more costly, but make it, say, five times more costly, ten times more costly, than it is for you? Really bring the pain. Would you do it?
Now that I have your attention–and the attention of plaintiffs’ counsel wondering if they’ve stumbled into a closed meeting at a corporate counsel retreat—I want to show you this is real. Not just because I say so, but because you prove it to yourself. You do the math.
Math! You didn’t say there would be math!
Stop. You know you’re good at math when the numbers come with dollar signs. Legendary Texas trial lawyer W. James Kronzer used to say to me, “I’m no good at math, Herman; but I can divide any number by three.” That was back when a third was the customary contingent fee.
Even after you do the math, you’re not going to believe it; instead, you’ll conclude it can’t be true. Surely nothing so unjust could have escaped my notice. Why would Courts allow this? How can I be such a sap?
The real question is this: What am I going to do about it?
First, a few facts to get on the same page.
FACT 1: Requesting Parties May Specify a Form or Forms of Production
When you pursue discovery, you may call what you seek “documents” and mentally equate them to paper records, but it’s electronically stored information (ESI). ESI must be produced in specified forms of production, either in native forms (being the form stored and used in the ordinary course of business) or in a static image format (a black and white screenshot of each page called a TIFF image plus a load file or files holding text and metadata). There are also near-native forms of production, such as when e-mailboxes are produced as individual messages called MSGs or EMLs.
If you’re thinking, ‘No, we just print it out and deal with it on paper,’ then this essay isn’t for you. All the best, but this is for lawyers handling cases with lots of information; lawyers with thousands or millions of documents and messages to plow through.
The Federal Rules of Civil Procedure and most states’ rules empower a requesting party to specify the form or forms in which electronically stored information is to be produced. FRCP Rule 34(b)(1)(C). If a requesting party fails to specify a form of production (and you should NEVER fail to specify), a producing party must supply ESI in the form or forms in which it is ordinarily maintained or in a reasonably usable form or forms. FRCP Rule 34(B)(2)(e)(ii)
Fact 2: Most E-Discovery Service Providers are “In the Cloud” and Charge by Data Volume
Whether in native or static image format, ESI must be processed (“ingested”) and hosted to be searchable and reviewable. [Again, if your reaction is, ‘we just print it out’ or ‘we don’t search it electronically,’ this isn’t for you.] Native forms are processed to extract their text and metadata, then indexed for search. TIFF and load file productions are indexed for search and processed to pair the page images with text and metadata. Either way, you pay a vendor to prepare the production for viewing and then pay a recurring “hosting” charge for online access to the production. The fees charged are based on the volume of data processed and/or hosted. More data costs more money. If you receive 10 times as much data, you pay a commensurate amount more to ingest and host. Vendors usually assess hosting fees as a monthly subscription, so the more data they host for you, the more you pay every month for the life of the case.
It’s no accident that vendor charges are opaque and vary wildly; but, at the bottom line, the rule is more data, more dollars.
Fact 3: TIFF images of native files are much larger than the native files.
More data isn’t the same thing as more information because not all electronic forms of information are equally efficient. When you convert native forms to static images and load files you explode the size of production by many multiples, and static productions come burdened by the further cost of impaired searchability, diminished functionality and lost color, animation and rich media. With TIFF, you get less and pay more. Not 50% or 100% more; perhaps a thousand percent more and beyond. This is notably the case for Word documents, PowerPoint presentations, Excel spreadsheets and collections of e-mail messages and attachments–the native forms at the heart of electronic discovery. The difference is genuine, material and carries a big bottom-line cost.
That’s a categorical statement, and some will immediately search for an exception. They will wonder, is it possible to fashion a native file larger than its TIFF counterpart? You could certainly construct a PowerPoint or Word document so laden with hi-resolution color photos, sound and video that, once you strip away the rich content, a static black & white image would occupy a size smaller than the native. But is a TIFF shorn of sound, video and color comparable, and is such a file representative of most collections produced in e-discovery? An emphatic “no,” on both counts, and it’s not an apples-to-apples” comparison.
A production must be reasonably usable. The TIFF without sound and video isn’t. When you add back the rich media and produce with extracted sound and video files, the TIFF production is indeed larger than the native, and more unwieldy.
You Do the Math
If you have an e-discovery processing tool available to you, it’s easy to process a collection of native Word files to a TIFF+ production then compare the collective file size of the TIFF images and extracted text to the collective size of the native files. Once you see how many times bigger the TIFF+ set is versus the native, you’ll understand why requesting parties pay so much more to load and host TIFF+ productions over native productions.
If you don’t have a processing tool, you’ll need a way to generate monochrome Group IV 600 dpi TIFF images like those produced in e-discovery. You could use an online file conversion tool, or you could save the DOCX file as a PDF and then use Adobe Acrobat to convert the file to a series of single page TIFF images. For a proper comparison, generate the TIFF images as single-page monochrome Group IV images at 600 dpi resolution. No multipage or color TIFF sets.
I used both approaches, processing a 540KB Word document grabbed online in Nuix Workstation 8, and also pulling the file into Adobe Acrobat (Create PDF from File), then using Acrobat to Save As TIFF (with Save as TIFF Settings configured to CCITT G4, Grayscale LZW, Colorspace: Monochrome and Resolution 600/pixels/inch as in figure at right). The Nuix-generated TIFF set was 4.83MB and the Adobe-generated set was 4.95MB. So, one TIFF set was 8.9 times larger and the other was 9.2 times larger than the native. You can find the same file I used at this link, but you don’t need to use the same file. You can use your own files or files from real cases. The point is to test it to your satisfaction using methods and evidence unassailable to you.
I use an exercise like the following in my teaching. Why don’t you try to solve it with your own tools?
Exercise: Calculate the Cost Difference Flowing from Alternate Forms of Production
There may be many variables that go into computing the cost of vendor services for e-discovery, and the charges for ingestion, processing, hosting and export are just parts of a more complicated puzzle. The purpose of this exercise is to gauge the difference that forms of production may make as a component of overall cost.
Problem: You are a requesting party in a federal case, and you have made a timely, compliant and unambiguous written request for production of responsive information in native and near-native forms. You have expressly requested that Microsoft Word documents be produced in their native .DOC or .DOCX formats. Your opponent instead produces Word documents to you as multiple .TIFF image files accompanied by a load file containing the extracted text from each document. When you object, your opponent counters that “this is what they always do” and that “TIFF plus load file is reasonably usable, so the Rules gave them the right to substitute TIFFs for natives.”
Assume that your opponent has produced 10,000 different Word documents which (for ease in making the calculation) are all exactly the same size as the native and converted file size for the file found at this link (a 540KB Word version of the Zubulake V opinion). Thus, the aggregate loaded and hosted native volume is 5.4GB (gigabytes). Further, assume that none of the documents are privileged or require redaction. None are hash-matching duplicates of any other items produced.
You’ve contracted with an e-discovery service provider to load and host the documents produced so you can review and tag the documents for use in the case. The service provider charges by the gigabyte to load and host the data month-to-month. The provider proposes to charge $20/gigabyte/month for loading and hosting the data
Any fraction of a gigabyte will be rounded up to a full gigabyte when calculating charges. The extracted text file size for the single file is 126KB or 1.26GB for all 10,000 files.
You intend to approach the Court to compel your opponent to produce the documents in the form you designated, and in addition to raising issues of utility, completeness and integrity, you want to determine whether the form produced to you will prove more expensive to load and host for the one-year period you expect to have the data online.
Question: If you accept the production in TIFF and load file, approximately how much more will it cost you over twelve months versus the same production in native forms?
How to Solve this Problem:
Step 1: Normalize the file sizes. Because the prices are quoted in gigabytes, you will want to express all data volumes in gigabytes, rather than as kilobytes or megabytes.
Remember: A kilobyte is one thousand bytes. A megabyte is one thousand kilobytes and a gigabyte is one thousand megabytes.
Step 2: Calculate the cost of Native Production using normalized values:
Native Production: Ten thousand files, each 540KB in size, is 5,400,000KB or 5.4GB. As noted above, productions in any format are processed to create a searchable index of extracted text; so, we must also add the extracted text for all the files, which will be ten thousand times 126KB or 1.26GB. An index is typically more compact than the aggregate extracted text, but I’ll use the aggregate value to assure a conservative comparison. The cost to load and host for one year would be:
Load and Host (5.4GB PLUS 1.26GB, rounded up to 7GB at $20.00/GB/month x 12 months) = $1,680.00
Step 3: Calculate the cost of TIFF and Text Load File Production using normalized values:
TIFF Plus Production: Ten thousand TIFF image sets, each 4.83 MB in size, is 48.3GB. We again add the extracted text for all the files, 1.26GB. Any fraction of a gigabyte must be rounded up to the next whole gigabyte. Consequently, the value we use for the aggregate size to load and host TIFF+ is the sum of 48.3GB plus 1.26GB rounded up to the next whole gigabyte or 50 gigabytes.
Load and host (50GB at $20.00/GB/month x 12 months) = $12,000.00
The cost difference would be ($12,00.00 less $1,680.00) = $10,320.00.
Producing in TIFF costs the requesting party seven times as much as the same data produced as native files.
This is an ultraconservative example. The exemplar file has no tracked changes or embedded comments and 10,000 documents is hardly a big production set. The differential is typically much larger (fifteen times more is commonplace) and will run to hundreds of thousands of dollars in large matters like MDL cases.
Cost is cause enough to demand production in native forms, but when an opponent produces in native formats, you’re getting what the other side used in the ordinary course of business. It’s the real evidence. It’s a form witnesses recognize. It’s complete and utile. Crucially, you can convert native forms to other forms–including static image formats–for those times you may want alternative formats.
But it doesn’t work both ways. You can’t convert TIFF images back to native originals. Not really. You can’t slim bloated static images down to svelte native forms. You can’t restore animations, color, formulas, tracked changes and comments, application metadata or hash values.
With TIFF productions, you’re stuck. You must pay vendors to ingest and host at grossly inflated data volumes. You have no choice. It’s like buying a car and the dealer delivers it encased in a block of concrete. You’re not going anywhere.
When you focus on facts, TIFF+ productions are less for more. Not a bit more, many multiples more. Real out-of-pockets dollars. So, if you want to fly in the face of FRCP Rule One’s goal of just, speedy and inexpensive litigation and really do the other side dirty, here’s the way. And as for all you requesting parties letting the other side do this to you, isn’t it time to stop the joke being on you?