Regular readers may tire of my extolling the virtues of native forms of production; but battleships turn slowly, and this one must yet be turned. Apart from judges (whose frequent unfamiliarity with electronic evidence makes them easy prey for prevarication), those best situated to end the ruin of TIFF+ productions are those who profit most from doing nothing.
Articles, speeches and blog posts can only go so far. What’s needed are published judicial decisions. Whether they go one way or the other, we need thoughtful opinions that lay out the issues in an accurate and balanced way, informing litigants what’s at stake. Many published orders fail to weigh the genuine pros and cons of each form of production. A few read as if TIFF images were the evidence and requesting parties were seeking to have God-given TIFF images converted into heretical native files. Talk about confused!
Seeking another published opinion on the merits of native production, I recently supplied a declaration to a federal court. I’m attaching an anonymized version of my testimony in the hope that readers will weigh the arguments. I concede “it ain’t Shakespeare,” but it’s honest. I changed a lot to make it difficult to identify the matter, although the Declaration is a matter of public record. Sorry, but I thought a little less candor would be the wiser path. The lawsuit is still very much in contention.
Here’s the anonymized version in PDF:
Here’s the guts of it:
DECLARATION OF CRAIG BALL
CRAIG BALL, hereby declares under penalty of perjury as follows:
- I, Craig Ball, am over the age of eighteen years and competent and qualified to make this declaration. I have been retained as a consultant for Defendants in this action. I am fully familiar with the facts contained herein based upon my personal knowledge.
- I am an attorney in good standing, licensed in the State of Texas and hold multiple professional certifications in computer forensics, data recovery and electronic discovery. I’ve published extensively on these disciplines with my work cited as authoritative by courts throughout the nation. I serve as a consultant and a court-appointed special master in computer forensics and e-discovery, and have previously been designated as an expert witness in my fields on numerous occasions and appointed by courts to serve as a neutral master in electronically stored evidence in forty-plus cases. I’ve lectured on electronic evidence all over the world, producing more than two thousand published papers and presentations. I own and operate the Law Offices of Craig D. Ball, P.C. based in Austin, Texas. I also serve on the law faculties of the University of Texas in Austin, Texas and Tulane University in New Orleans, Louisiana, where I teach law school courses on electronic evidence and digital discovery. I am an academic, but my opinions are informed by almost forty years in my disciplines, thirty years as first-chair trial counsel and over twenty years of service as a neutral Special Master in Electronically Stored Evidence. I limit my practice to matters involving digital evidence and discovery.
- I supply a true and correct copy of my C.V. as Exhibit A to this Declaration. [OMITTED]
- I provide this Declaration in support of Defendants’ previously submitted proposed version of the Parties’ ESI Order and to rebut comments made in the Declaration of Jane Doe Submitted in Support of Plaintiff’s Proposed ESI Protocol, February 30, 2020, (“Doe Declaration”).
Times Have Changed, but Plaintiff’s Proposed ESI Protocol is Mired in the Past
- In pre-digital times, discovery was conducted using hard copy paper documents, and the cost and burden of marshaling paper was more-or-less the same for all sides. Deliberately making productions more difficult to use—by, e.g., removing and shuffling pages, separating attachments from transmittals or producing illegible copies—was understood to be dirty pool.
- I practiced trial law in both the paper and digital era, and I’ve spent the last four decades researching, using, writing about and teaching electronic evidence and e-discovery.
- In a world of paper evidence, converting electronic information to static TIFF images made sense. In today’s world of electronically stored information, it makes no sense to convert electronic formats to TIFF images considering the information that’s lost and the severe burden and added expense so-called TIFF+ productions impose on requesting parties.
- TIFF+ productions like those advocated by Plaintiff are an antiquated, expensive legacy of a pre-digital era dominated by paper records. Black and white TIFF images were superior only to paper records; today, static image productions visit severe hardships on parties doing electronic discovery—hardships serving to make discovery more costly and less useful in speedily reaching the facts needed for just resolution.
- The profound question before the Court is whether evidence produced in discovery may be deliberately degraded to deny requesting parties substantially the same economies- and ease-of-use enjoyed by producing parties.
- I speak of “parties,” not law firms, because our system of civil discovery does not cognize systematic de-construction of evidence and the stripping of non-privileged content by lawyers seeking a leg up in litigation. In the interests of integrity and economy, forms of production should faithfully reflect the forms of the evidence as the parties and witnesses use and acknowledge them. This is particularly true when static forms of production tilt the playing field in unjust and wasteful ways.
- If we consider the evidence from the point of view of the parties and the issues at bar, these are the facts:
- In the ordinary course of business, Plaintiff uses electronically stored information in native forms.
- Plaintiff uses native forms because they are the most efficient, functional and complete forms.
- Native forms are most efficient because they store data compactly without sacrificing content.
- Native forms are most functional because they perform all tasks users require of electronic files.
- Native files are most complete because they hold all content, including color, structure, and metadata.
- I can attest that Plaintiff uses native forms because the form in which Plaintiff’s software routinely creates and stores ESI is, by definition, its native form. As a certified computer forensic examiner, I don’t need to examine Plaintiff’s systems to know that it stores and uses Microsoft Word documents in native forms. All Word users do. I needn’t speculate as to the native form of Plaintiff’s e-mail because there are only a handful of e-mail formats in common use and all of them adhere to a common format that TIFF images cannot replicate. If they did not hew to the native format, Plaintiff’s messages could not traverse the Internet.
Plaintiff’s Proposal Creates Asymmetry in Access to Evidence, and Unnecessary Cost
- Under Plaintiff’s proposal, Plaintiff will collect e-mail messages and Word documents from its employees. It will collect the e-mail in portable native formats from its e-mail systems and collect the Word documents in the .DOCX (and .DOC) formats native to Microsoft Word.
- Under Plaintiff’s proposal, when Plaintiff’s lawyers select what they deem responsive, their e-discovery service provider (vendor) will generate a set of black-and-white letter-size TIFF images of the native items. All color features will be eliminated. Static TIFF images are not electronically searchable unlike the native forms of the evidence, so the vendor must create additional files to hold as much of the text from the native evidence as the vendor can extract or as much text as the vendor can glean from the image using optical recognition (OCR) software. For e-mail and Word documents, extraction is more common than OCR.
- Because the conversion of native evidence to TIFF images under Plaintiff’s proposal strips away all the native file’s metadata, the vendor must attempt to salvage select metadata and stow it in files termed “load files.” The native evidence is now in three or more pieces: an unsearchable image or images, a file with collected or replicated text and a file with metadata. The latter two are the “plus” in a TIFF+ production.
- Still more files are needed to serve as assembly instructions pointing to the disparate shards of data used to fashion a crude, colorless facsimile of the original evidence. This reassembly requires an e-discovery “review platform.” The text files are then indexed by the review platform to recreate a level of searchability reminiscent of the inherent, easy searchability of the native evidence.
- Thus, Plaintiff seeks to produce a “kit” obliging the defendants to assemble a model of the real evidence, though that model will neither look like nor work like the original evidence and will be missing pieces.
Defendant’s Proposal Reduces Burden and Equalizes Access to Evidence
- At the outset, it must be understood that the form of the evidence the Plaintiff seeks to produce is not the form used by the parties or familiar to the witnesses. Plaintiff seek to produce a degraded, incomplete, and bloated form that makes discovery prohibitively costly for the defendants and corrupts searchability.
- The defendants don’t want TIFF images of the evidence and have requested Plaintiff produce the evidence in its native forms. The defendants don’t ask that Plaintiff produce both native forms and TIFF images, as that would be duplicative and wasteful. Producing just the native form is sufficient and puts the parties on a level footing.
- The plaintiff’s proposal unduly complicates discovery by converting some evidence to TIFF images (Mail, Word Documents) while supplying other evidence in native forms (Spreadsheets, PowerPoints). An advantage of supplying native forms is that native forms can be readily converted to other forms such that any party wanting TIFF images can generate TIFFs from native forms, though the converse isn’t true. TIFF images cannot be converted to native files. Native forms are not only less costly to load and host, but they’re also the most flexible and utile forms.
- If native is appropriate for PowerPoints and Excel Spreadsheets, why isn’t it good enough for everything? Plaintiff would likely respond that PowerPoints and spreadsheets don’t lend themselves to production as TIFFs. In fact, PowerPoint presentations, Excel spreadsheets and Word documents all derive from the same Microsoft Office “family” of file formats. All are Zip-compressed files formatted in Extensible Markup Language (XML). The ability to process native PPTX and XLSX documents in the Plaintiff’s discovery workflow establishes that there is no impediment to processing Word DOCX items in the same workflow, so as to be produced more economically in the same manner as its sibling Office formats. There are huge advantages to supplying DOCX Word documents in their native formats, particularly lower cost, proper juxtaposition of comments and tracked changes, native support of color and OLE (Object Linked and Embedded) content and inherent searchability.
- Again, parties don’t use TIFF images; only lawyers do, and none of the justifications for their use hold water in 2020. The parties use native forms in all their day-to-day business. The parties do not convert Word documents or email messages to TIFF images. They work with the native files for the simple reason that, when you try to convert the three-dimensional, multihued data of word-processed documents to two-dimensional, black-and-white forms, you sacrifice information, corrupt content and visit needless expense and burden on an opponent. Indeed, it is that last impact that often drives efforts to produce TIFF images instead of the authentic, original, native evidence.
- The optimum form in which to produce evidence in discovery is the form most faithful to the evidence as the parties encountered it; that is, the form that best preserves the integrity of the evidence vis-à-vis the events and transactions on which it bears. Certainly, there are times when we must settle for a color photo of a victim so as not to not produce the corpse; but absent rare and compelling cause, we must strive to ensure the most complete and faithful forms are produced.
Plaintiff’s Claimed Advantages for TIFF Productions Are Illusory
- Plaintiff suggests that, if Defendants receive the evidence in the same form as Plaintiff uses it, Defense counsel will mishandle the evidence to deface its contents or will email persons counsel is ethically bound not to contact. The Plaintiff posits that TIFF images cannot be altered or misused by those determined to break the rules, as if an unethical attorney couldn’t edit a TIFF or pluck an address from a TIFF image and abuse it.
- When used in conjunction with all industry-standard review tools, the text in a native file is as static as that of any TIFF image. Review tools guard against alteration by emulating native views of the evidence without opening files in native applications. By contrast, if I load a TIFF image in a drawing program, I can change it, including altering its Bates numbering. Likewise, if I load a PowerPoint presentation into PowerPoint, I can change it. There is nothing static about data if one is intent upon altering it. Competent e-discovery workflows, purpose-built tools and our duties of professional responsibility augur against intentional and fraudulent alteration of evidence. TIFF images and native files are both composed of ones and zeroes. Either can be changed; but for both, there are effective means to detect and deter fraudulent and negligent handling.
- For native forms to be altered in e-discovery, they must first be exported from the Defendant’s e-discovery review tool and then opened in a native application. If the “risk” is that defense counsel will intentionally load evidence into programs to alter the evidence, then “static” TIFF images pose a greater risk of alteration than native forms because TIFFs are just black and white pictures and can be readily changed in any of the countless drawing and editing programs found on Windows and Mac computers, even using PowerPoint. ESI is only numbers, and numbers are easily changed.
- For an e-mail in native form to facilitate a “reply-all” event or trigger a read receipt or calendar item, the message must first be exported and then opened in a compatible e-mail client program. The best indication why this is not a material risk is the fact that it doesn’t routinely happen to Plaintiff’s counsel, who use the same small universe of discovery tools and have the same ability to export items into native applications. That a long-ago lawyer was once so foolish as to load evidence into Outlook for review and seek to reply to other people’s e-mail is not a cross every lawyer should bear. Once more, modern review tools employ native file viewers that allow native forms, including e-mail messages, to be viewed without need of native applications.
- In measuring the claimed risk, I point to the dearth of motions for sanctions on these grounds and the absence of a single reported decision where it was alleged that counsel improperly e-mailed anyone because they were supplied native forms in discovery. Fertile minds can imagine a parade of horribles and there are risks of perfidy or error in anything; but the risks put forward here are more speculative than real. If the touted risks materialize, the Court has ample authority to deter and punish misconduct. Until then, the benefits and cost savings of native production are real and substantial.
Native Productions Support Bates Numbering with Nothing Lost and Much Gained
- Now that nearly all written evidence items are blocks of data—files on disks and records in databases—the printed page is not an efficient or economical way to unitize electronically stored information (ESI). As well, enumeration of ESI by page numbers based upon conversion to a static image format is like measuring and delivering water as ice cubes or steam. You can do it, but you really shouldn’t. Unitization should be based on the native form (e.g., gallons), not the occasional altered form (cubes or cubic feet) until and unless the change of form is necessitated by the usage. The same logic holds true for ESI.
- For items produced in discovery, the unitization that makes most sense is the native unitization, files. Word processed documents, presentations, spreadsheets, email, photos, videos and sound recordings all manifest as files in the ordinary course. We store them as files, collect them as files, process and enumerate them as files and hash them as files for deduplication and authentication. It only stands to reason that we should produce and Bates number electronic evidence as files.
- Parties “emboss” Bates numbers on files in the same way that parties identify files in the ordinary course. That is, they name each file produced or withheld to reflect its Bates number. It’s a flexible method that comports with the longstanding practice of naming images of printed pages to mirror the Bates numbers embossed on those pages. Bates numbers can be prepended to file names, appended to them or simply replace the filename (as the original filename is always produced in an accompanying load file). Nothing is lost and, because filenames aren’t stored inside files, changing a file’s name in this way doesn’t alter the file’s content or hash value. Native production doesn’t end the use of Bates numbers; it just adapts the numbering to the appropriate unitization.
- Parties use these Bates numbered files in the same manner as parties use any ESI in electronic discovery; that is, they employ a purpose-built e-discovery tool to view the contents of the file and the application displays the Bates number, rather than using the native Word, Outlook or PowerPoint program to view it. TIFF+ productions don’t obviate the use of review platforms, they just don’t enable the tools to be as efficient, economical or flexible.
- The key to understanding why paged Bates numbering is simple and cheap for native productions is distinguishing how parties review ESI versus how parties present it as exhibits.
- It’s unquestionably convenient to print ESI used as exhibits to paginated formats on those occasions when a clear record is facilitated by doing so. I’ve taken hundreds of depositions, argued countless of motions and tried many cases. Depositions, trials and hearings haven’t changed much over my 38 years at the Bar; so, I’m no stranger to the value of embossed Bates numbers when data is printed for presentation to a witness or tribunal.
- The question isn’t whether there’s a need and place for Bates numbered static forms (i.e., paper and electronic printouts), but when should conversion occur, applied to which parts of a production, and importantly, who gets to decide and at what cost (measured in money, utility and completeness)?
- Native production splits the process of Bates numbering. The producing party retains the right to assign the Bates number to the file produced. The right to add page numbers belongs to the party who prints the electronic evidence for use in a proceeding. The Bates number assigned by the producing party must be embossed on every page of the printout along with the page numbers and confidentiality labels. That way, the producing party can always relate a printed item to its source file. In turn, all parties can reference the printout by Bates number and page number in the conventional way lawyers cite to exhibits in proceedings.
- The objection raised to this is, “Won’t that mean that different printouts could have different pagination? Won’t that be confusing?” It’s possible that slight variations in page breaks could occur if the same file is printed on different systems and printers. In theory, that could prompt confusion; but in practice, it’s not a problem. The record is perfectly clear with respect to any version used by a witness or presented to the Court. You can concoct a situation where it’s pesky, but the reality is that it works quite well.
- The reason litigants never faced this presumptive confusion before e-discovery was because, if you used a document I’d produced to you in discovery, that document bore the Bates number I’d stamped on it. You were forced to use the pagination I’d assigned. You couldn’t print a version with different pagination because I hadn’t produced the electronic evidence to you; I’d produced a printout. That was convenient and acceptable back when the evidence and a printout were useful and complete in the same ways. However, ESI and printouts are not the same anymore. They aren’t useful in the same ways. They aren’t complete in the same ways. They don’t cost the same to use. Notwithstanding these differences, the Plaintiff still claims the exclusive authority to assign pagination at the time of production. That is, they demand the power to impose the wrong form of unitization at the wrong point in the discovery process. The Defendants’ proposed protocol makes more sense.
Native Production Will Be Markedly Less Costly, Month-After-Month
- Just as a review platform is needed to search and review a TIFF+ production, a review platform is used to search and review a native production. Reliance on an e-discovery review platform is a necessity of 21st century trial practice. In the last decade, e-discovery review platforms have ceased to run on desktop and laptop computers but, like so much of business computing, have migrated to the Cloud and run in “hosted” online environments leased from “hosted service providers.” In hosted environments, the subscriber pays a monthly fee tied to the gigabyte volume of information hosted, that is, stored and readily available to reviewers via a web browser and the Internet. Users may also pay by the gigabyte to add data (“ingest”) into these repositories and export data out. Provider fee schedules vary making direct comparison difficult, but nearly all employ volumetric pricing, viz., the larger the volume of data hosted, the greater the monthly subscription fee.
- Larger files cost more. Much larger files cost much more. TIFF images are much larger files.
- Larger files also impair workflow. A larger file takes longer to transmit, prompts longer load times, longer times to process and, ultimately, more time just to move from one page to the next.
- When I say, “TIFF images costs more,” I do not mean incrementally more; I mean many multiples more. Where a native production costs $1X to host, the same production as TIFF+ will cost $3X, $5X, $10X or more to host, month-after-month, for less complete and utile forms. I cannot point to a fixed multiple because the difference hinges on the nature of the file, its compressibility, complement of rich media content, use of color and image resolution. Some Group 4 TIFF image productions sacrifice legibility for size and produce a smaller multiple.
- I can attest, based on years of study, testing and experience, that a native format production set composed of file types often seen most in e-discovery will be substantially smaller in size versus a comparable TIFF production set. The difference is often startling.
- To demonstrate the difference, I periodically process identical data sets in industry-standard e-discovery tools, generating a native production set and a TIFF production set, each adhering to customary specifications for, inter alia, image resolution, load file structure and Bates numbering.
- For purposes of illustration in this declaration, I processed two simple sets of data as examples. Because I have no access to Plaintiff’s responsive files, I processed a single e-mail from Plaintiff’s counsel to defense counsel transmitting a native Word attachment. I also processed a set of publicly available PDF files composed of 24 Plaintiff publications and five sets of procedural rules from the U.S. Courts website. I chose these to be readily accessible to anyone wishing to do their own assessment, but the composition of the collection makes little difference. The upshot is that the conversion of native files to TIFF images hugely inflates the size of the files. See Exhibit B.
Example 1: A single email and attachment from Darrow Clarence, Esq. transmitting a draft protocol in Word format.
Native size: One file comprising 90.3 kilobytes
TIFF Images: 17 TIFF files comprising 3.77 megabytes
The TIFF images are 41.74 times larger than the native source, yet they lack color and none of the tracked changes seen in the Word file are visible.
Example 2: 29 PDF files from Plaintiff and U.S. Courts
Native size: 29 files comprising 23.9 megabytes
TIFF Images: 869 TIFF files comprising 301 megabytes.
The TIFF images are 12.59 times larger than the native source, yet they lack all the color used extensively in the Plaintiff documents, the hyperlinks were stripped, and the extensive internal navigation features are non-functional.
- Whether the data are PDF files, Word documents or e-mail messages, converting the functional and complete native evidence to colorless TIFF images massively increases the size of the production even as it degrades utility and intelligibility. The difference directly translates to a much greater cost to Defendants to host the data, month-after-month. Not slightly higher but, as seen above, many multiples higher. The PDF examples would cost 13 times more and the e-mail and attachment 34 times more. The TIFF image production Plaintiff seeks will cost Defendants much more even while the bloated files they want to substitute operate to slow review and conceal content.
- It’s easy to attack these examples because they don’t use a larger swath of Plaintiff’s own data or by asserting that Plaintiff’s vendor has a better TIFF conversion tool or trims file sizes by lowering legibility. The multiples will move higher and lower as the data changes; but one fact won’t change: Swapping static TIFF images for native forms visits a punitive economic penalty on Defendants.
Threading E-Mail for Review is Compatible with Producing E-Mail in Native Forms
- Threading e-mail for review makes reviewers’ lives easier. Instead of looking at each e-mail in a chain, threading presents the constituent messages as a conversation, using either the most inclusive message at the “top” of the chain (suppressing duplicate constituents) or by synthesizing a chain from constituent messages and suppressing intermediate attachments and message header data. This convenience comes at a cost when responsive, non-privileged suppressed evidence isn’t produced to Defendants. Other witnesses and counsel have done an effective job outlining the hardships to Defendants attendant to Plaintiff’s threading proposal, so I only wish to note that the review tools that suppress the constituent messages also track those constituent messages, making it feasible to gain the convenience of threading without neglecting the duty to make production.
- Threading doesn’t suppress constituent messages, attachments and headers as a matter of legal right. They aren’t suppressed because they are privileged or irrelevant. They are suppressed because it’s expedient. It saves money. Plaintiff claims nothing suppressed is material, and all is merely redundant. If that’s so, then allowing the system tracking constituent messages to include the responsive constituent messages in the production is a trivial burden on the Plaintiff and a crucial safeguard for the Defendants. The cost savings are realized in the human review; the production of the constituent messages is a mechanical task that costs nearly nothing and poses no risk to the Plaintiff if the Plaintiff is truly suppressing only redundant content as claimed.
- In the bygone days of paper production, parties fought over color production because color copies were ten times more expensive than grayscale copies and were reserved to, e.g., reproduction of color photographs. Because color printing was exceptional and expensive, colorized business documents were scarce. The digital revolution put tools for color emphasis and imagery on every desktop. Spreadsheets used color to highlight cells and signal significance. Modern productivity files employ color photographs, color keys for maps and graphs, color-coding and color heat maps as just a few examples. Color is used across a wide range of electronic evidence to denote hyperlinks that would otherwise be indistinguishable from bold or underline text. How are Defendants to determine whether a reference is merely plain bolded text, or if it was colored to reflect a hyperlink?
- When used to convey or emphasize information, color is essential to comprehension of the evidence. Color is an integral feature of native file formats and requires no special handling or expense to produce. The color is built into the file. From the standpoint of clarity and completeness, black-and-white images are incapable of conveying all the information in a color document. Too, producing color-rich documents as static color images is costly because static color images are substantially larger in file size than static monochrome images, so they are many, many times more expensive to ingest and host. The better way—the ideal way—to supply essential color isn’t to add color to TIFF images, further bloating their size and increasing cost, but to make production in the native formats that convey the color information at no cost to any parties.
- In practice, not all use of color translates to a discernable, shades-of-gray difference in appearance. Accordingly, the Plaintiff’s proposal to produce only black and white images doesn’t work. It’s difficult or impossible to perceive instances of color in grayscale documents to form that requisite “belief” that color is present; a belief required to request the item be reproduced in color. It’s a process certain to prompt delay and workflow disruption. The superior approach is native production, where color is cost-free and requires no time-consuming battles over whether color is substantive or decorative.
Cost of Hash Authentication
- Plaintiff’s declarant, Jane Doe, argues that, “if the authenticity of a Native file is suspect, it would need to be validated by generating a hash value and comparing it to the hash value of the originally produced file to verify if a document has been changed.” Ms. Doe raises this as a burden because her employer, Service Provider, Inc, would impose additional fees to assist with the process.
- My experience is that when the authenticity of evidence from a producing party is suspect on the theory that the requesting party has altered it, the cost of authentication is not a paramount concern. Reputations and law licenses hang in that balance. Ms. Doe fails to consider the cost to verify a suspect TIFF file would be considerably higher if hashing weren’t used. There, human beings billing princely sums would need to assess the integrity by eye. Ms. Doe overstates not only the likelihood of needing hash authentication, but also its cost, and this argument has no weight against Defendants’ proposal of a native production..
APPENDIX B: COMPARING NATIVE AND TIFF SIZES