A writer’s hubris is the conviction that when you’ve covered a topic, you’ve had your say. But new readers rarely have time or desire to plumb earlier work and, were they to try, much of what I wrote on the underpinnings of e-discovery and forensics was long ago stolen away like Persephone to a paywall-protected underworld, leaving this Demeter to mourn. So, I briefly return to a point that has never gained traction in the minds of the bar, viz. why producing in native file formats doesn’t require we give up cherished Bates numbering. Doug Austin, the Zeus of e-discovery bloggers, recently re-addressed the same topic in his estimable E-Discovery Daily. Call me a copycat, but I was here first.
As many times as I’ve written and spoken on the Native DeBates, I’ve never felt I nailed the topic. I’ve not succeeded in conveying the logic, ease and advantage of a bifurcated approach to Bates numbering and pagination. So, one more shot.
Start by imagining a world where, instead of just numbering pages, runaway enumeration demanded everyone number lines of text in each item produced in discovery. That’s not far-fetched considering that pleadings in California and deposition transcripts everywhere have long numbered lines. If I demanded that of you in discovery, wouldn’t you sensibly respond that it’s overkill and lawyers have managed just fine by numbering by page breaks instead?
Now that you’re thinking about the balance between enumeration and overkill, let’s set aside tradition and come at Bates numbering by design. Mark a fancy word: unitization. Everything is unitized: time in days and hours, buildings in square feet or meters, television in seasons and episodes, books in chapters and pages. Humans love to unitize stuff, and our units ofttimes grow from quaint and antiquated origins that we cling to because, well, uh, um, dammit, we’ve just always done it that way!
Recently, I had a tough time getting rid of perfectly nice file cabinets because they were sized to hold files fourteen inches wide. When I became a lawyer, every pleading had to be filed on fourteen-inch-long “legal size” paper, not the familiar eleven-inch letter paper. Later, courts abolished legal size pleadings and…poof…that venerable unit was history. Now, even the notion of filing paper with courts is a relic. Things changed because it was cheaper and more efficient to change. Standards do change and units do change, even in the staunchly stodgy corridors of Law.
Why did the letter-sized page become a worldwide standard unit for documents? It didn’t. In most places outside of the U.S. and Canada, standard paper sizes are based on surface area not dimensions. The “standard” European page size is called “A4” and measures 8.3” × 11.7” (not that most of the world measures in inches, but let’s not go there). Anyone old enough to remember a time when word-processed documents were printed on mechanical printers will remember that a document’s layout and pagination varied from printer-to-printer. There’s nothing magical or iconic about the letter size page as a unit of printed information; and less so as fewer information items flow from or onto paper manifestations. Spreadsheets aren’t paginated. Neither are e-mails, websites, PowerPoints, message threads, voicemail or video.
Once, nearly all written evidence was stored as paper documents. Now, nearly all written evidence items are blocks of data: files on disks and records in databases. The printed page is not an efficient or economical way to unitize electronically stored information (ESI). As well, enumeration of ESI by page numbers based upon conversion to a static image format is like measuring and delivering water as ice cubes or steam. You can do it, but you really shouldn’t.
I want ice in my drink and steam in my iron, but my principal consumption of water will be as liquid, its “native” form at room temperature. Accordingly, unitization should be based on the native form (e.g., gallons), not the occasional altered form (cubes or cubic feet) until and unless the change of form is necessitated by the usage.
The same logic holds true for ESI.
For items produced in discovery, the unitization that makes most sense is the native unitization, files. Word processed documents, presentations, spreadsheets, photos, videos and sound recordings all manifest as files in the ordinary course. We store them as files, collect them as files, process and enumerate them as files and hash them as files for deduplication and authentication. It only stands to reason that we produce and Bates number items as files.
We “affix” Bates numbers to files in the same way that we identify files in the ordinary course. That is, we name each file produced or withheld to reflect its Bates number. It’s a flexible method that comports with the longstanding practice of naming images of printed pages to mirror the Bates numbers embossed on those pages. Bates numbers can be prepended to file names, appended to them or simply replace the filename (as the original filename is always produced in an accompanying load file). Nothing is lost and, because filenames aren’t stored inside files, changing a file’s name in this way doesn’t alter the file’s content or hash value.
Native production won’t end the use of Bates numbers; it just adapts the numbering to the appropriate unitization.
Oddly, naming files to reflect Bates numbers is tough for some to grasp. Perhaps they imagine it’s done manually, though of course it occurs simply and automatically, adding no cost to the process. Most lawyers wonder how they will use Bates numbered files. The answer is you use Bates numbered files in the same manner as you use any ESI in electronic discovery; that is, you employ an application to view the contents of the file and the application displays the Bates number. Not the native Word or PowerPoint program, but one of the many tools purpose-built to allow lawyers to review and search ESI.
This isn’t manifestly clear to lawyers who have trouble distinguishing between how they will review ESI versus how they will present it as exhibits. That’s a costly confusion.
It’s unquestionably convenient to print ESI used as exhibits to paginated formats on those occasions when a clear record is facilitated by doing so. I’ve taken hundreds of depositions, argued tons of motions and tried loads of cases. Depositions, trials and hearings haven’t changed much over my 37 years at the Bar; so, I’m no stranger to the value of embossed Bates numbers when data is printed for presentation to a witness or tribunal.
The question isn’t whether there’s a need and place for Bates numbered static forms (i.e., paper and electronic printouts), but when should conversion occur, applied to which parts of a production, and importantly, who gets to decide and at what cost (measured in money, utility and completeness)?
Native production splits the process of Bates numbering. The producing party retains the right to assign the Bates number to the file produced. The right to add page numbers belongs to the party who prints the electronic evidence for use in a proceeding. The Bates number assigned by the producing party must be embossed on every page of the printout along with the page numbers. That way, the producing party can always relate a printed item to its source file. In turn, all parties can reference the printout by Bates number and page number in the conventional way lawyers cite to exhibits in proceedings. Yes, Virginia, there really is a Bates number and pagination method for native files.
You may ask, “Won’t that mean that different printouts could have different pagination? Won’t that be confusing?” It’s possible that slight variations in page breaks could occur if the same file is printed on different systems and printers. In theory, that could prompt confusion; but in practice, it’s not a problem. The record is perfectly clear with respect to any version used by a witness or presented to the Court. You can concoct a situation where it’s chaos, but the reality is that it works quite well.
The reason we never faced this presumptive confusion before e-discovery was because, if you used a document I’d produced to you in discovery, that document bore the Bates number I’d stamped on it. You were forced to use the pagination I’d assigned. You couldn’t print a version with different pagination because I hadn’t produced the electronic evidence to you; I’d produced a printout. That was convenient and acceptable back when the evidence and a printout were useful and complete in the same ways. However, ESI and printouts are not the same anymore. They aren’t useful in the same ways. They aren’t complete in the same ways. They don’t cost the same to use. Notwithstanding these differences, producing parties still claim the exclusive authority to assign pagination at the time of production. That is, they demand the power to impose the wrong form of unitization at the wrong point in the discovery process.
Let’s focus on cost. When I seek production of ESI in its native electronic forms, it’s because that’s the form in which the evidence is used in the ordinary course of business and the most complete, utile and economical form. It’s the form the witnesses used. It’s the authentic evidence.
Producing parties resist native production for reasons I’ve addressed and refuted in other posts and publications. They once fought native production of spreadsheets and presentations; but that was always a lost cause. Yet, we still skirmish over e-mail and word-processed documents. Producing parties assert that electronic printouts (so called “TIFF Plus” productions”) are “reasonably usable” alternatives to native forms. My experience is that they can be usable, but frequently are an inadequate substitute for the complete, utile native forms.
The debate over forms of production might be written off as so much navel gazing if there weren’t a massive economic penalty imposed on requesting parties forced to accept TIFF Plus productions. A TIFF Plus production is many times larger byte wise that the same production made natively. For most, the cost of loading and hosting electronically stored information is determined by the amount of data loaded, processed and hosted. Ten times more data byte wise costs ten times more to ingest and ten times more to access online, month after month. Ten times more is at the low end of the differential.
This is hard for lawyers to accept. Requesting parties seem oblivious to the huge TIFF Plus penalty they bear. When I teach e-discovery at the University of Texas School of Law or Georgetown Law Center, I task my students to independently explore the cost difference. They generate a native production set and a TIFF Plus set from the same collection, then apply market rate ingestion and hosting prices to each. The difference? Native production and hosting cost about $30,000.00 less for 150MB of data than the same data produced as TIFF images.
So, what do you do? Let me preface by telling you what producing parties don’t have to do. They don’t need to buy software, change workflows or study computer science to make this work. All the leading e-discovery software tools support the ability to name files to mirror Bates numbers. Their e-discovery service providers can do it with ease. Of course, service providers might not be thrilled by their reduced billings for ingestion and hosting, but those savings directly benefit your clients and, unlike the savings from, say, predictive coding, they don’t come out of lawyers’ pockets.
There are many easy ways to add Bates numbers and page numbers to files when you print them out. But here’s the most important takeaway: Lawyers customarily use just a fraction of items produced in discovery. All files are Bates numbered when produced; only that tiny fraction printed for use as exhibits must be paginated for making a record.
The bottom line: You needn’t give up Bates numbers to reap the savings and utility of native productions. It’s wrong to suggest, “You can’t Bates number native productions;” you absolutely can and, importantly, you don’t have to depart from the familiar ways you use evidence as exhibits.
The graphic below incorporates the Electronic Discovery Reference Model and overlays pointers to the EDRM stages where Bates numbers and page numbers should be added. Producing parties add Bates numbers to filenames during processing; but by deferring conversion of ESI to paginated static images (e.g., printouts) until needed for presentation, only a small complement of production must be converted and degraded. Keeping the rest of the production in native forms ensures that the evidence retains its completeness, utility and economy (“just, speedy and inexpensive” being the goal). Application metadata and other content aren’t stripped away, and the size of the production won’t mushroom ten- or fifteen-fold, dramatically increasing the cost to load and host the production.