Who says You Can’t Bates Number Native Productions?

A writer’s hubris is the conviction that when you’ve covered a topic, you’ve had your say.  But new readers rarely have time or desire to plumb earlier work and, were they to try, much of what I wrote on the underpinnings of e-discovery and forensics was long ago stolen away like Persephone to a paywall-protected underworld, leaving this Demeter to mourn.  So, I briefly return to a point that has never gained traction in the minds of the bar, viz. why producing in native file formats doesn’t require we give up cherished Bates numbering.  Doug Austin, the Zeus of e-discovery bloggers, recently re-addressed the same topic in his estimable E-Discovery Daily.  Call me a copycat, but I was here first.

As many times as I’ve written and spoken on the Native DeBates, I’ve never felt I nailed the topic.  I’ve not succeeded in conveying the logic, ease and advantage of a bifurcated approach to Bates numbering and pagination.  So, one more shot.

Start by imagining a world where, instead of just numbering pages, runaway enumeration demanded everyone number lines of text in each item produced in discovery.  That’s not far-fetched considering that pleadings in California and deposition transcripts everywhere have long numbered lines.  If I demanded that of you in discovery, wouldn’t you sensibly respond that it’s overkill and lawyers have managed just fine by numbering by page breaks instead?

Now that you’re thinking about the balance between enumeration and overkill, let’s set aside tradition and come at Bates numbering by design.  Mark a fancy word: unitization.  Everything is unitized: time in days and hours, buildings in square feet or meters, television in seasons and episodes, books in chapters and pages.  Humans love to unitize stuff, and our units ofttimes grow from quaint and antiquated origins that we cling to because, well, uh, um, dammit, we’ve just always done it that way!

Recently, I had a tough time getting rid of perfectly nice file cabinets because they were sized to hold files fourteen inches wide.  When I became a lawyer, every pleading had to be filed on fourteen-inch-long “legal size” paper, not the familiar eleven-inch letter paper.  Later, courts abolished legal size pleadings and…poof…that venerable unit was history. Now, even the notion of filing paper with courts is a relic.  Things changed because it was cheaper and more efficient to change.  Standards do change and units do change, even in the staunchly stodgy corridors of Law. Continue reading

Have We Lost the War on E-Discovery?

Is there a war on e-discovery?  Sounds like a paranoid notion, but the evidence is everywhere.  The purpose of discovery is to exchange information bearing on matters in litigation, particularly material tending to prove or disprove the parties’ claims and defenses.  The soul of discovery is disclosure of relevant records and communication, limited by privilege and proportionality. So, you’d think the focus of e-discovery would be on where information resides and the forms it takes, on how to preserve it, collect it and produce it.  That was what we talked about a decade ago, but, no more.

Now, when I look at the composition of e-discovery education, I’m flummoxed by how the tide has turned to anti-discovery topics.  Instructing lawyers how to surface information has been steadily supplanted by how to keep information at bay and defend failures to disclose. There is no balance between supporting the right to obtain information and the right to withhold it.

Proportionality is about limiting the scope of discovery.  Privacy and GDPR seek to limit access to information.  Cost control is code for circumscribed discovery.  Even cybersecurity tends to be positioned to confound discovery.  I see discussions of “streamlining” privilege logs that advocate giving as little information as possible about items withheld on claims of privilege.  Considering the regularity with which privilege claims are abused, shouldn’t we require greater specificity be brought to logging so that privilege stops being the black hole in which we hide everything we don’t want to hand over?  Privilege is anathema to evidence and must be narrowly construed.  No one talks about that.

Don’t get me wrong.  These are important topics.  Discovery needs to be just, speedy and inexpensive.  But why do we keep forgetting that there’s a comma in there?  Will we ever balance our self-interest in advancing our client’s wishes against our common interest in a justice system that serves everyone? Continue reading

Electronic Storage in a Nutshell

I’ve just completed the E-Discovery Workbook for the 2019 Georgetown E-Discovery Training Academy. The Workbook readings and exercises plot the path that evidence follows from the documents lawyers use in court back to the featureless stream of binary electrical impulses common to all information stored electronically. At nearly 500 pages, the technology of e-discovery is its centerpiece, and I’ve lately added a 21-point synopsis of the storage concepts, technical takeaways and vocabulary covered. Here is that in-a-nutshell synopsis:

  1. Common law imposes a duty to preserve potentially-relevant information in anticipation of litigation
  2. Most information is electronically-stored information (ESI)
  3. Understanding ESI entails knowledge of information storage media, encodings and formats
  4. There are many types of e-storage media of differing capacities, form factors and formats:

    a) analog (phonograph record) or digital (hard drive, thumb drive, optical media)

    b) mechanical (electromagnetic hard drive, tape, etc.) or solid-state (thumb drive, SIM card, etc.)

  5. Computers don’t store “text,” “documents,” “pictures,” “sounds.” They only store bits (ones or zeroes)
  6. Digital information is encoded as numbers by applying various encoding schemes:

    a) ASCII or Unicode for alphanumeric characters;

    b) JPG for photos, DOCX for Word files, MP3 for sound files, etc.

  7. We express these numbers in a base or radix (base 2 binary, 10 decimal, 16 hexadecimal, 60 sexagesimal). E-mail messages encode attachments in base 64.
  8. The bigger the base, the smaller the space required to notate and convey the information
  9. Digitally encoded information is stored (written):

    a) physically as bytes (8-bit blocks) in sectors and partitions

    b) logically as clusters, files, folders and volumes

  10. Files use binary header signatures to identify file formats (type and structure) of data
  11. Operating systems use file systems to group information as files and manage filenames and metadata
  12. File systems employ filename extensions (e.g., .txt, .jpg, .exe) to flag formats
  13. All ESI includes a component of metadata (data about data) even if no more than needed to locate it
  14. A file’s metadata may be greater in volume or utility than the contents of the file it describes
  15. File tables hold system metadata about the file (e.g., name, locations on disk, MAC dates): it’s CONTEXT
  16. Files hold application metadata (e.g., EXIF geolocation data in photos, comments in docs): it’s CONTENT
  17. File systems allocate clusters for file storage; deleting files releases cluster allocations for reuse
  18. If unallocated clusters aren’t reused, deleted files may be recovered (“carved”) via computer forensics
  19. Forensic (“bitstream”) imaging is a method to preserve both allocated and unallocated clusters
  20. Because data are numbers, data can be digitally “fingerprinted” using one-way hash algorithms (MD5, SHA1)
  21. Hashing facilitates identification, deduplication and de-NISTing of ESI in e-discovery

All of these topics and more are covered in depth at the Academy, punctuated by substantive and substantial hands-on exercises. We ask more of the students than most seasoned e-discovery professionals can deliver. It’s hours of effort before you arrive and a full week of day and night endeavor once you’re here. Over a thousand pages of written material covered in toto.  Really, no picnic.  A true boot camp.  It exhausts and overwhelms those anticipating conventional professional education; but those who do the work emerge transformed.  They leave competent, confident and equipped with new eyes for ESI. Think you can hack it? We can help. Hope to see you there June 2-7.

P.S. No member of the Academy faculty is compensated.  We are all volunteers, there because we believe the more you know about e-discovery, the more you can contribute to the just, speedy and inexpensive administration of justice.

Mueller? Mueller? More E-Discovery Lessons from Bill and Bob


I read a couple of good articles on the e-discovery implications of the Mueller report and tweeted,

The Mueller report underscores why image+ productions are ridiculous. Compare the OCR to the true text. It’s a mess, so search is off. Image files many times larger than the native, ergo much more costly to load, store, host, transmit. BTW: YES, you CAN redact a Word file. It’s XML!

This bears fleshing out, and I want to do it by sharing a simple trick enabling you to peer inside the raw guts of a Microsoft Word file and understand why native redaction isn’t the pipe dream some try to make it.  But first, let’s unpack the jargon.

“Image+” or “TIFF+” productions refer to the common practice of fixing the content of a document by printing the file to a static image format like TIFF or PDF.  I use “fixing” in the sense of making something permanent, but it’s also accurate to use it the way we speak of “fixing” a cat; that is, cutting its balls off.

The “plus” in TIFF+ refers to the need to supply the native file’s searchable text and application metadata in ancillary load files to accompany the page images.  That is, rather than supply the evidence, producing parties degrade it to a deconstructed “kit” version of the evidence that requesting parties must load into review platforms to restore a crude level of searchability. This enables producing parties to suppress content (like embedded comments, speaker notes and changes in text documents) and much of the application metadata of the original.  It also neuters the evidence.  It’s no longer functional in the programs that created it, like Word, PowerPoint or Excel.

I’ve written extensively about this elsewhere (e.g., Lawyers’ Guide to Forms of Production ), and I try to present the pros and cons of TIFF+, notwithstanding my belief that the cons decidedly outweigh the pros.  It largely comes down to Bates numbers and disagreement about how and when those fetishistic Bates identifiers should be added to evidence and at what absurd cost.

TIFF+ enables producing parties to sidestep their obligation to review unprinted information for responsiveness and privilege.  Instead, they silently make that content disappear like a “fixed” cat’s testicles.  To be fair, most lawyers know so little about ESI processing that they are blissfully unaware it’s happening, so they deny it with genuine equanimity.  When you force them to acknowledge the spoliation, they fall back on claiming that, whatever they excised and didn’t review wouldn’t have been worth the trouble of reviewing or producing.  Genius, right?

Apart from what’s missing from the dumbed-down data, the big objection I offer to TIFF over native productions is the huge size difference between them.  TIFF productions are much, much fatter.  Though information and utility has been stripped from the images, the degraded set is nonetheless many times larger (measured in bytes) than the native originals.

Because most e-discovery service providers price their wares by the gigabyte volume going into, onto and out of their systems, bigger files mean bigger bills.  Much bigger files mean…well, you get it.

Perhaps you’re thinking, “Craig, you sad, sad Cassandra; how much bigger can these image sets be than their native counterparts?”  Would ten times bigger surprise you? Well, then surprise!  But, they’re usually more than ten times larger.  It’s not a one-off rip-off either.  Most hosted platforms charge you for the fatter file volume every month.  Over, and over, and over again.

Sucker. Continue reading

Storage Media: Long Past Herman Hollerith

It’s that semiannual time when I revise my E-Discovery Workbook in advance of the Georgetown Law Center eDiscovery Training Academy.  That means foregoing sunny Spring days in The Big Easy to pore over 500 pages of content and exercises to make them as durable and endurable as I can.  More-and-more, I find I’m adding historical perspectives.  It’s a fair criticism that, with so much to cover, I should restrict my focus to contemporary technologies and leave the trips down memory lane to my dotage.

I can’t help myself.  Though we’ve come far and fast, the information technologies of my youth are lurking just beneath the slick surfaces of the latest big thing.  The punch card storage and tabulation technologies Herman Hollerith (1860-1929) used to revolutionize the 1890 U.S. census are just a hair’s breadth behind the IBM card technologies that dominated data processing for much of the 20th century and cousin to the oily, yellow perforated paper tape that Bill Gates and I used on opposite coasts to learn to program mainframe computers via a teletype terminal in the 1970s.  The encoding schemes of that obsolete media differ from those we use today principally in speed and scale.  The binary fundamentals are still…fundamental, and connect our toil in e-discovery and computer forensics to the likes of Charles Babbage, Alan Turing, Ada Lovelace, John von Neumann, Robert Noyce and both Steves (Wozniak and Jobs).

In the space of one generation, we have come very far indeed. Continue reading

The Computer Book: A Pleasant Stroll through the History of Computing

I returned from frigid New York City last night, modestly triumphant that I hadn’t botched my interview with Watergate journalist and Fear author Bob Woodward.  Woodward turned out to be just the nicest guy and we got on swimmingly.  I shouldn’t be surprised as many of the highly successful people I’ve known have proved courteous and generous of spirit.  I guess nice guys finish first because we are happy to help them succeed.

In New York, heading to the Whitney to take in the excellent Andy Warhol retrospective, I happened on an architectural antiques store in the Meatpacking District called Olde Good Things.  I love such places and was delighted to find they were selling vintage Jacquard loom cards.  I collect (NERD ALERT!) examples of milestone computing technologies, especially antecedent digital storage devices like piano rolls, magnetic core memories and, now, Jacquard loom cards!  I use these for “show-and-tell” in my digital evidence classes.  In a touching twist, the cards I bought were salvaged from an abandoned lace factory in Scranton, Pennsylvania, the old coal town a/k/a Electric City where my father grew up and is laid to rest.  Here’s my acquisition:

This digression has a purpose.  Waiting for me on my return to New Orleans was a book I’d ordered called, “The Computer Book” by Simson Garfinkel and Rachel Grunspan.  It’s subtitled, “From the Abacus to Artificial Intelligence, 250 Milestones in the History of Computer Science;” but, don’t be put off by that mouthful; it’s a delightful read and a visual feast.  Each of the 250 well-curated, chronological milestones are flanked by gorgeous full-page photography.  Among them, Milestone 13, The Jacquard Loom:

The punched cards used in the Jacquard loom circa 1801 were later adapted by inventor Herman Hollerith to tabulate the U.S. Census in 1890 and were forerunner to the punched IBM cards that were a common medium to enter and store digital data from the 1930s through 1970s.  Another descendent: the punched paper tape I used to store BASIC computer programs in high school circa 1972.  Our modern computing feats are often smaller, speedier reimaginings of age-old technologies.  The Computer Book ably underscores that evolution.

I bought the book because I’ve followed Simson Garfinkel’s extraordinary career since he was a graduate student buying second hand hard drives and scaring the snot out of people by revealing how much sensitive “deleted” data could be resurrected via forensic file carving.  That’s common knowledge now, but largely because pioneers like Simson made it so.  Simson is Professor Garfinkel today as well as the Senior Computer Scientist for Confidentiality and Disclosure Avoidance at the US Census Bureau.  Shades of Herman Hollerith! Simson holds seven patents and has published dozens of articles on computer security and digital forensics.

I’m considering making the book required reading for my law classes–something I’ve not done before as I prefer my students not go out-of-pocket.  The Computer Book succeeds in being accessible to the lay reader in a way few books about computing match. To really understand technologies, laws or people, it pays to delve into their origins.  If I ran the world, The Computer Book would be required reading for anyone in the e-discovery space.

Meet Bob Woodward, Living Legend

Bob WoodwardIt’s almost that time, two weeks from LegalTech New York (okay, LegalWeek for those who hang on for more wintry weather) and the fine folks at Zapproved have again asked me to interview a gifted interviewer–on Broadway, no less–for the annual Corporate E-Discovery Heroes Awards.  I’ve had past fireside chats with Nina Totenberg, Doris Kearns Goodwin and Eugene Robinson.  My subject this year is Bob Woodward.


I’ll use the same two words Woodward himself uttered on June 17, 1972 as a cub reporter for the Washington Post covering an arraignment of five well-dressed Watergate burglars.  On hearing perpetrator James McCord whisper “CIA” when asked his employer, Woodward exclaimed:


I mean, BOB WOODWARD!  Author of nineteen  books, thirteen #1 national bestsellers.  The dean of investigative journalism.  The 2019 PEN America Literary Service Award winner (per this morning’s New York Times).  The man who helped earn two Pulitzer Prizes for the Post.  The man who brought down a President.  Robert Redford played him in All the President’s MenNot pruney 2019 Redford, either.  We’re talking 1976 sex symbol Robert Redford!


I better get this right. Will you help me?  In the comments below, I invite you to suggest questions I might pose to the living legend onstage.  Don’t worry.  Woodward wrote “Fear.”  We will talk Trump.

It’s a very special night in another way.  My dear, dear friend, the Honorable John Michael (yada, yada, yada) Facciola, will receive the 2019 Hon. Shira Scheindlin Lifetime Achievement Award in recognition of a career that has advanced the practice of electronic discovery.  The “yada, yada, yada” denotes that Fatch has more middle names than a British nobleman.  Fitting, as John Facciola is truly a noble man and richly deserving of this award.  I’m excited about who the presenters will be; but, I’m not spoiling that surprise.  You’ll just have to attend.

Honored as well will be four “Corporate E-Discovery Heroes,” nominated by their peers and selected by an esteemed panel of judges.  What? FINE! Esteemed and me.  Who won?  Like I said, you’ll just have to attend.  Please do.

Though seating is limited, tickets are still available, and dinner and drinks are included.  It’s going to be a hell of a party!  Don’t miss it.

Where: Edison Ballroom, 240 W 47th St, New York, NY 10036
When: January 28, 2019 at 6:00pm
Register by: January 25, 2019 11:59 PM Eastern Time

Bring your copy of Fear, All the President’s Men, The Final Days, The Brethren, Wired or one of the others.  No promises, but I bet you can get it signed by the man who inspired a generation of journalists.

Loving Location Histories

I give dozens of talks each years on electronic evidence where I discuss geolocation data and its transformative potential as evidence in criminal prosecutions and civil litigation.  Smart phones constantly track our movements using gyroscopes, accelerometers, global positioning features, geolocation apps, cell tower triangulation and three independent radio systems. Our steps are tallied, altitudes logged, and, for many, vital signs are monitored, too.  We are earthbound astronauts, instrumented and coupled to sensors and telemetry as thoroughly as any who journey into space.

This doesn’t fully resonate with audiences until I guide them through their own phones, showing the level of detail with which movements are tracked.  Some listeners boast that they’ve set their privacy settings to block geolocation.  They’re the ones most surprised to learn that, although they can disable their ability to see their own geolocation history and stop geolocation data from being shared with apps, they can’t disable geolocation broadcasting and still have a functioning phone.  Here’s the bottom line: if a phone can operate as a phone, it must broadcast its geolocation coordinates with a precision of ten meters (~30 feet) or better.  U.S. law requires it.

When I broach geolocation data and see that look of “we already know this” creep across faces, that’s when I ask for a show of hands of how many in the audience use iPhones.  Nearly every hand shoots up.  I then invite them to drill down in their phone’s Settings with me to the Significant Locations logs.  Surprisingly, most have never done this before and are shocked, even frightened, by the richness of detail in the data.

To try it on your iPhone,navigate through Settings>Privacy>Location Service>System Services> Significant Locations.  Unless you’ve disabled your ability to see geolocation data, you’ll arrive at the phone’s History list setting out locales visited, and the number of sites gone to within those locales.

But, wait!  There’s more! Continue reading

Cloud Takeouts: Can I Get That to Go?

Apple take outTwo-and-a-half years ago, I concluded a post with this bluster:

“Listen, Amazon, Apple, Microsoft and all the other companies collecting vast volumes of our data through intelligent agents, apps and social networking sites, you must afford us a ready means to see and repatriate our data.  It’s not enough to let us grab snatches via an unwieldy item-by-item interface.  We have legal duties to meet, and if you wish to be partners in our digital lives, you must afford us reasonable means by which we can comply with the law when we anticipate litigation or respond to discovery. You owe us that.  Alexa, are you listening?”

Amazon hasn’t listened; but, Apple lately gave users the ability to download our data.  Credit for this awakening goes to the European Union’s Global Data Protection Regulation (GDPR) that went into effect on May 25.

Data takeout capabilities are essential to protecting civil liberties and meeting legal duties.  Google’s given users a simple, effective means to repatriate data (including Gmail and calendar data) for five years, although search histories have only been supplied for two.  Twitter’s supported robust data takeout for five years; and eight years ago, Facebook became the first big social media site to offer its users the ability to download contributed content.

Apple is late to the party but it didn’t come empty-handed.  The Apple takeout is extensive and can be huge.  My download comprised 63GB in 26 compressed Zip archive files.  It took Apple five days to assemble the data and make it available for download; then, I had to download each file, one-by-one.  There’s no way to download them all, leaving the distinct impression that Apple doesn’t want takeout to be too easy.  In fairness, had I opted to have Apple deliver my data in 25GB chunks (the largest chunk option) instead of the 5GB file limit I specified, it would have been easier.

In my case, almost all the volume were photos replicated in iCloud.  Notably absent was my messaging, which Apple can’t archive and thus can only be obtained from the iPhone or a backup of same (see my post Mobile to the Mainstream). Continue reading

Mad About Metadata

mad about metadataIt’s the month for giving thanks, and I’m ever-grateful for the daily e-discovery blog penned by my friend, Doug Austin, for CloudNine.  It’s tough to get out a post every business day, and Doug’s done it splendidly for, what, nine years now?  Kudos!  Doug’s EDiscovery Daily blog is often my first heads-up for new e-discovery cases, true again for the decision he featured this morning,  Metlife Inv’rs. USA Ins. Co. v. Lindsey, No. 2:16-CV-97 (N.D. Ind. Oct. 25, 2018)

It’s a familiar scenario.  The requesting party expressly demands native file production.  The responding party, a big insurance company, produces static image formats as non-searchable PDFs.  When the requesting party objects, the carrier argues that the metadata it strips from the evidence isn’t relevant and that the request for native forms is disproportionate, again challenging relevance, and also claiming that producing in the native forms sought would be cumulative because (chutzpah!) they’d already produced in PDF over their opponent’s timely objection.

To its credit, the Court makes short work of MetLife’s high-handedness and orders native production but stumbles a bit on the relevance and scope issues.  The Court addresses the relevance objection by noting that native production may shed light on who accessed information and that this may inform whether the insurer had a duty to investigate the policy application.  Maybe.  More likely, it won’t.  But, the Court shouldn’t have let itself be drawn in by a specious relevance challenge.

There are two varieties of file metadata: application metadata and system metadata.  Relevance should never matter for application metadata or dog tag system metadata.  If a file is sufficiently relevant to be responsive, no requesting party should be required to further demonstrate that metadata within the file is independently relevant.  The burden to prove a right to excise parts of relevant files should rest with the party altering the evidence.  Moreover, a file’s name, path and last modified date (“dog tag” metadata) are so patently useful that their utility more than relevance should serve as  sufficient basis for the production of essential system metadata. Continue reading