Who says You Can’t Bates Number Native Productions?

A writer’s hubris is the conviction that when you’ve covered a topic, you’ve had your say.  But new readers rarely have time or desire to plumb earlier work and, were they to try, much of what I wrote on the underpinnings of e-discovery and forensics was long ago stolen away like Persephone to a paywall-protected underworld, leaving this Demeter to mourn.  So, I briefly return to a point that has never gained traction in the minds of the bar, viz. why producing in native file formats doesn’t require we give up cherished Bates numbering.  Doug Austin, the Zeus of e-discovery bloggers, recently re-addressed the same topic in his estimable E-Discovery Daily.  Call me a copycat, but I was here first.

As many times as I’ve written and spoken on the Native DeBates, I’ve never felt I nailed the topic.  I’ve not succeeded in conveying the logic, ease and advantage of a bifurcated approach to Bates numbering and pagination.  So, one more shot.

Start by imagining a world where, instead of just numbering pages, runaway enumeration demanded everyone number lines of text in each item produced in discovery.  That’s not far-fetched considering that pleadings in California and deposition transcripts everywhere have long numbered lines.  If I demanded that of you in discovery, wouldn’t you sensibly respond that it’s overkill and lawyers have managed just fine by numbering by page breaks instead?

Now that you’re thinking about the balance between enumeration and overkill, let’s set aside tradition and come at Bates numbering by design.  Mark a fancy word: unitization.  Everything is unitized: time in days and hours, buildings in square feet or meters, television in seasons and episodes, books in chapters and pages.  Humans love to unitize stuff, and our units ofttimes grow from quaint and antiquated origins that we cling to because, well, uh, um, dammit, we’ve just always done it that way!

Recently, I had a tough time getting rid of perfectly nice file cabinets because they were sized to hold files fourteen inches wide.  When I became a lawyer, every pleading had to be filed on fourteen-inch-long “legal size” paper, not the familiar eleven-inch letter paper.  Later, courts abolished legal size pleadings and…poof…that venerable unit was history. Now, even the notion of filing paper with courts is a relic.  Things changed because it was cheaper and more efficient to change.  Standards do change and units do change, even in the staunchly stodgy corridors of Law. Continue reading

Have We Lost the War on E-Discovery?

Is there a war on e-discovery?  Sounds like a paranoid notion, but the evidence is everywhere.  The purpose of discovery is to exchange information bearing on matters in litigation, particularly material tending to prove or disprove the parties’ claims and defenses.  The soul of discovery is disclosure of relevant records and communication, limited by privilege and proportionality. So, you’d think the focus of e-discovery would be on where information resides and the forms it takes, on how to preserve it, collect it and produce it.  That was what we talked about a decade ago, but, no more.

Now, when I look at the composition of e-discovery education, I’m flummoxed by how the tide has turned to anti-discovery topics.  Instructing lawyers how to surface information has been steadily supplanted by how to keep information at bay and defend failures to disclose. There is no balance between supporting the right to obtain information and the right to withhold it.

Proportionality is about limiting the scope of discovery.  Privacy and GDPR seek to limit access to information.  Cost control is code for circumscribed discovery.  Even cybersecurity tends to be positioned to confound discovery.  I see discussions of “streamlining” privilege logs that advocate giving as little information as possible about items withheld on claims of privilege.  Considering the regularity with which privilege claims are abused, shouldn’t we require greater specificity be brought to logging so that privilege stops being the black hole in which we hide everything we don’t want to hand over?  Privilege is anathema to evidence and must be narrowly construed.  No one talks about that.

Don’t get me wrong.  These are important topics.  Discovery needs to be just, speedy and inexpensive.  But why do we keep forgetting that there’s a comma in there?  Will we ever balance our self-interest in advancing our client’s wishes against our common interest in a justice system that serves everyone? Continue reading

Electronic Storage in a Nutshell

I’ve just completed the E-Discovery Workbook for the 2019 Georgetown E-Discovery Training Academy. The Workbook readings and exercises plot the path that evidence follows from the documents lawyers use in court back to the featureless stream of binary electrical impulses common to all information stored electronically. At nearly 500 pages, the technology of e-discovery is its centerpiece, and I’ve lately added a 21-point synopsis of the storage concepts, technical takeaways and vocabulary covered. Here is that in-a-nutshell synopsis:

  1. Common law imposes a duty to preserve potentially-relevant information in anticipation of litigation
  2. Most information is electronically-stored information (ESI)
  3. Understanding ESI entails knowledge of information storage media, encodings and formats
  4. There are many types of e-storage media of differing capacities, form factors and formats:

    a) analog (phonograph record) or digital (hard drive, thumb drive, optical media)

    b) mechanical (electromagnetic hard drive, tape, etc.) or solid-state (thumb drive, SIM card, etc.)

  5. Computers don’t store “text,” “documents,” “pictures,” “sounds.” They only store bits (ones or zeroes)
  6. Digital information is encoded as numbers by applying various encoding schemes:

    a) ASCII or Unicode for alphanumeric characters;

    b) JPG for photos, DOCX for Word files, MP3 for sound files, etc.

  7. We express these numbers in a base or radix (base 2 binary, 10 decimal, 16 hexadecimal, 60 sexagesimal). E-mail messages encode attachments in base 64.
  8. The bigger the base, the smaller the space required to notate and convey the information
  9. Digitally encoded information is stored (written):

    a) physically as bytes (8-bit blocks) in sectors and partitions

    b) logically as clusters, files, folders and volumes

  10. Files use binary header signatures to identify file formats (type and structure) of data
  11. Operating systems use file systems to group information as files and manage filenames and metadata
  12. File systems employ filename extensions (e.g., .txt, .jpg, .exe) to flag formats
  13. All ESI includes a component of metadata (data about data) even if no more than needed to locate it
  14. A file’s metadata may be greater in volume or utility than the contents of the file it describes
  15. File tables hold system metadata about the file (e.g., name, locations on disk, MAC dates): it’s CONTEXT
  16. Files hold application metadata (e.g., EXIF geolocation data in photos, comments in docs): it’s CONTENT
  17. File systems allocate clusters for file storage; deleting files releases cluster allocations for reuse
  18. If unallocated clusters aren’t reused, deleted files may be recovered (“carved”) via computer forensics
  19. Forensic (“bitstream”) imaging is a method to preserve both allocated and unallocated clusters
  20. Because data are numbers, data can be digitally “fingerprinted” using one-way hash algorithms (MD5, SHA1)
  21. Hashing facilitates identification, deduplication and de-NISTing of ESI in e-discovery

All of these topics and more are covered in depth at the Academy, punctuated by substantive and substantial hands-on exercises. We ask more of the students than most seasoned e-discovery professionals can deliver. It’s hours of effort before you arrive and a full week of day and night endeavor once you’re here. Over a thousand pages of written material covered in toto.  Really, no picnic.  A true boot camp.  It exhausts and overwhelms those anticipating conventional professional education; but those who do the work emerge transformed.  They leave competent, confident and equipped with new eyes for ESI. Think you can hack it? We can help. Hope to see you there June 2-7.

P.S. No member of the Academy faculty is compensated.  We are all volunteers, there because we believe the more you know about e-discovery, the more you can contribute to the just, speedy and inexpensive administration of justice.

Mueller? Mueller? More E-Discovery Lessons from Bill and Bob

 

I read a couple of good articles on the e-discovery implications of the Mueller report and tweeted,

The Mueller report underscores why image+ productions are ridiculous. Compare the OCR to the true text. It’s a mess, so search is off. Image files many times larger than the native, ergo much more costly to load, store, host, transmit. BTW: YES, you CAN redact a Word file. It’s XML!

This bears fleshing out, and I want to do it by sharing a simple trick enabling you to peer inside the raw guts of a Microsoft Word file and understand why native redaction isn’t the pipe dream some try to make it.  But first, let’s unpack the jargon.

“Image+” or “TIFF+” productions refer to the common practice of fixing the content of a document by printing the file to a static image format like TIFF or PDF.  I use “fixing” in the sense of making something permanent, but it’s also accurate to use it the way we speak of “fixing” a cat; that is, cutting its balls off.

The “plus” in TIFF+ refers to the need to supply the native file’s searchable text and application metadata in ancillary load files to accompany the page images.  That is, rather than supply the evidence, producing parties degrade it to a deconstructed “kit” version of the evidence that requesting parties must load into review platforms to restore a crude level of searchability. This enables producing parties to suppress content (like embedded comments, speaker notes and changes in text documents) and much of the application metadata of the original.  It also neuters the evidence.  It’s no longer functional in the programs that created it, like Word, PowerPoint or Excel.

I’ve written extensively about this elsewhere (e.g., Lawyers’ Guide to Forms of Production ), and I try to present the pros and cons of TIFF+, notwithstanding my belief that the cons decidedly outweigh the pros.  It largely comes down to Bates numbers and disagreement about how and when those fetishistic Bates identifiers should be added to evidence and at what absurd cost.

TIFF+ enables producing parties to sidestep their obligation to review unprinted information for responsiveness and privilege.  Instead, they silently make that content disappear like a “fixed” cat’s testicles.  To be fair, most lawyers know so little about ESI processing that they are blissfully unaware it’s happening, so they deny it with genuine equanimity.  When you force them to acknowledge the spoliation, they fall back on claiming that, whatever they excised and didn’t review wouldn’t have been worth the trouble of reviewing or producing.  Genius, right?

Apart from what’s missing from the dumbed-down data, the big objection I offer to TIFF over native productions is the huge size difference between them.  TIFF productions are much, much fatter.  Though information and utility has been stripped from the images, the degraded set is nonetheless many times larger (measured in bytes) than the native originals.

Because most e-discovery service providers price their wares by the gigabyte volume going into, onto and out of their systems, bigger files mean bigger bills.  Much bigger files mean…well, you get it.

Perhaps you’re thinking, “Craig, you sad, sad Cassandra; how much bigger can these image sets be than their native counterparts?”  Would ten times bigger surprise you? Well, then surprise!  But, they’re usually more than ten times larger.  It’s not a one-off rip-off either.  Most hosted platforms charge you for the fatter file volume every month.  Over, and over, and over again.

Sucker. Continue reading

Storage Media: Long Past Herman Hollerith

It’s that semiannual time when I revise my E-Discovery Workbook in advance of the Georgetown Law Center eDiscovery Training Academy.  That means foregoing sunny Spring days in The Big Easy to pore over 500 pages of content and exercises to make them as durable and endurable as I can.  More-and-more, I find I’m adding historical perspectives.  It’s a fair criticism that, with so much to cover, I should restrict my focus to contemporary technologies and leave the trips down memory lane to my dotage.

I can’t help myself.  Though we’ve come far and fast, the information technologies of my youth are lurking just beneath the slick surfaces of the latest big thing.  The punch card storage and tabulation technologies Herman Hollerith (1860-1929) used to revolutionize the 1890 U.S. census are just a hair’s breadth behind the IBM card technologies that dominated data processing for much of the 20th century and cousin to the oily, yellow perforated paper tape that Bill Gates and I used on opposite coasts to learn to program mainframe computers via a teletype terminal in the 1970s.  The encoding schemes of that obsolete media differ from those we use today principally in speed and scale.  The binary fundamentals are still…fundamental, and connect our toil in e-discovery and computer forensics to the likes of Charles Babbage, Alan Turing, Ada Lovelace, John von Neumann, Robert Noyce and both Steves (Wozniak and Jobs).

In the space of one generation, we have come very far indeed. Continue reading

The Computer Book: A Pleasant Stroll through the History of Computing

I returned from frigid New York City last night, modestly triumphant that I hadn’t botched my interview with Watergate journalist and Fear author Bob Woodward.  Woodward turned out to be just the nicest guy and we got on swimmingly.  I shouldn’t be surprised as many of the highly successful people I’ve known have proved courteous and generous of spirit.  I guess nice guys finish first because we are happy to help them succeed.

In New York, heading to the Whitney to take in the excellent Andy Warhol retrospective, I happened on an architectural antiques store in the Meatpacking District called Olde Good Things.  I love such places and was delighted to find they were selling vintage Jacquard loom cards.  I collect (NERD ALERT!) examples of milestone computing technologies, especially antecedent digital storage devices like piano rolls, magnetic core memories and, now, Jacquard loom cards!  I use these for “show-and-tell” in my digital evidence classes.  In a touching twist, the cards I bought were salvaged from an abandoned lace factory in Scranton, Pennsylvania, the old coal town a/k/a Electric City where my father grew up and is laid to rest.  Here’s my acquisition:

This digression has a purpose.  Waiting for me on my return to New Orleans was a book I’d ordered called, “The Computer Book” by Simson Garfinkel and Rachel Grunspan.  It’s subtitled, “From the Abacus to Artificial Intelligence, 250 Milestones in the History of Computer Science;” but, don’t be put off by that mouthful; it’s a delightful read and a visual feast.  Each of the 250 well-curated, chronological milestones are flanked by gorgeous full-page photography.  Among them, Milestone 13, The Jacquard Loom:

The punched cards used in the Jacquard loom circa 1801 were later adapted by inventor Herman Hollerith to tabulate the U.S. Census in 1890 and were forerunner to the punched IBM cards that were a common medium to enter and store digital data from the 1930s through 1970s.  Another descendent: the punched paper tape I used to store BASIC computer programs in high school circa 1972.  Our modern computing feats are often smaller, speedier reimaginings of age-old technologies.  The Computer Book ably underscores that evolution.

I bought the book because I’ve followed Simson Garfinkel’s extraordinary career since he was a graduate student buying second hand hard drives and scaring the snot out of people by revealing how much sensitive “deleted” data could be resurrected via forensic file carving.  That’s common knowledge now, but largely because pioneers like Simson made it so.  Simson is Professor Garfinkel today as well as the Senior Computer Scientist for Confidentiality and Disclosure Avoidance at the US Census Bureau.  Shades of Herman Hollerith! Simson holds seven patents and has published dozens of articles on computer security and digital forensics.

I’m considering making the book required reading for my law classes–something I’ve not done before as I prefer my students not go out-of-pocket.  The Computer Book succeeds in being accessible to the lay reader in a way few books about computing match. To really understand technologies, laws or people, it pays to delve into their origins.  If I ran the world, The Computer Book would be required reading for anyone in the e-discovery space.

Preserving MAC Times Collecting Files in E-Discovery

MAC timesChecking the mailbag, I received a great question from a recent Georgetown E-Discovery Training Academy attendee.  I’m posting it here in hopes my response may be useful to you.

My student wrote: I have a question in regard to zipping eDiscovery data. We’ve always used 7zip to zip our collections. The filenames are too long for Microsoft to be happy with them in their original state. One of our consultants is now telling me that I’m changing metadata. Can you clear this up for me? Am I changing metadata just by zipping a file? If I am, are there other simple tools that I can use? 

Metadata is always changed in the copying of files within a Windows environment.  Anytime you copy data to new media, Windows changes some of its metadata.  Some e-discovery collection tools change the values back to the originating values as part of the collection process.  Thus, the metadata changes, then changes back to undo the change.  If you want to use such tools, they are out there.

I think the more important concern is whether the tools and methods you employ reconstruct the metadata that matters and preserve the integrity of the evidence files.  There is a simple way for you to assess that: check the MAC (modified/accessed/created) dates and hash the files in and out!  You did some exercises of this nature in my Georgetown Academy workbook. Continue reading

Tech Tip: Get your iPhone Back

lock-screen“Will the person who left their cell phone at the security checkpoint please retrieve it?”  People constantly leave their phones behind at security checkpoints, washrooms, checkout counters and charge stations.  Too, the little buggers slip out of pockets and purses.  More than three million phones are lost in the U.S. every year, and less than one-in-ten lost phones finds its way home.  Saturday night, I found an iPhone on the floor at a big party in the Faubourg Marigny in New Orleans.  I located the owner by asking everyone in sight if they’d lost a phone, and when I found her, the owner didn’t know she’d dropped it.

There are high tech tools to find lost phones like the Find My iPhone app or Tile locators; but, these only work for owners and require a second connected device.  What do the persons who find your phone or the Lost & Found staff do to quickly locate you, often before you realize your phone’s gone?  You don’t have an ID tag with contact data on your phone, right?

I do something that’s so darn simple, it’s a wonder it’s not already an option on every iPhone: I embed my name and email address in the lock screen photo (i.e., the wallpaper image that appears when you press the sleep/wake button, even when the phone is locked).  Now, any announcement over the P.A. includes my name, and I’ve furnished a secure way for good samaritans to contact me to arrange return.  It’s also an easy means to supply emergency contact information, should the good samaritan find you dropped alongside your phone.

There are plenty of ways to add text to your lock screen image–I’ve used the drawing tools in PowerPoint–but the simplest is to use the image editing tools right on your iPhone.  Here’s how (in iOS 10.1.1):

  1. Select an image to serve as your lockscreen wallpaper.  Use one with not-too-busy space for text (like the clouds in mine).  The text location shouldn’t conflict with the date and time text.  You may prefer to use a picture of yourself to make it easier to find you and prove it’s your phone.
  2. Duplicate the image so as not to alter your original.  Do this by selecting Share (box with the up arrow) and Duplicate.
  3. Working with the duplicate image, choose Edit from the toolbar (abacus-like slider), then choose More (circle with three dots).  Select Markup (toolbox icon) and finally choose the Text option (uppercase “T” in a box).
  4. A text box will appear in the center of your image.  You can resize it by dragging the blue dots or reposition it by dragging the box.  You can change the font face, font size, text color and alignment from the menu bar.
  5. Type your information.  Be sensible, e.g., don’t include your home address, and don’t use your mobile number (duh). Click Done (upper right corner).
  6. To make the edited image your lockscreen wallpaper, go to Settings>Wallpaper>Choose a New Wallpaper.  In All Photos, navigate to the annotated image you just created and select it (tap). Move and scale the image as suits you, then select Set from the menu and choose Set Lock Screen.  You’re done!

Happy E-Discovery Day!

e-discovery-day-2016As I stow the turkey platter and box up the pilgrim décor, I’m reminded that it’s time once more to celebrate E-Discovery Day, TODAY, Thursday, December 1.  No doubt, you’re saying, “So SOON?!?!  I still haven’t retrieved those E-Discovery Day 2015 balloons that got loose in the atrium, and who’s going to eat all that E-Discovery Day Kringle taking up space in the office freezer?” (Special-ordered from Racine in the traditional e-discovery flavor, Cinnamon, TIFF and Tears™).

I know.  Already?  We don’t even have new Federal rules this time!  Judges are still exercising discretion when meting out sanctions for spoliation, and proportionality is back on top, though no one knew it was gone!

But, as the E-Discovery industry has thoughtfully fashioned a holiday to fill the tedious weeks between Thanksgiving and Christmas/Chanukah/Kwanza, let’s warm the wassail, join hands and lift our voices in celebration for those few cherished hours that are E-Discovery Day.  Remember: there’s still time to shop for the perfect E-Discovery Day gift, and as a tip, Ralph “Gimpy” Losey has a new $100 book of reprinted blog posts, perfect for the e-discoverer on your list still stymied by the web browser.  (Get well soon, Ralph!)

Let me invite you to begin your fun-filled E-Discovery Day at the non-intuitive time of 11:15 am eastern/8:15 am pacific TODAY, Thursday, December 1, 2016, by listening to a panel comprised of Robert Cruz, Tara Jones, Zach Warren and Yours Truly discussing Mainstream News & E-Discovery: What You Should Be Watching Out for in 2017. Per our hosts Actiance and Exterro, we will be recapping “what news events you should be tracking and proactively advising your legal team on to ensure you’re prepared to take on new e-discovery risks in 2017.”

In truth, we will be talking about a plenitude of topics that pop into our heads, including how e-discovery in 2017 will not even slightly resemble e-discovery in 2016.  Thanks to automation, TAR 42.0, automobile telematics, deeply-buried ABA commentary and easy-to-apply proportionality standards, you won’t even have to show up at work anymore.  Instead, you’ll just tell Alexa, Siri, Cortana and Hey Google, “Get me the non-privileged e-stuff,” and it will be done in seconds for a pittance.  But, sadly, if you miss our webcast (and the hours of fine programming that follow), don’t be surprised if e-discovery in 2017 looks to you, the uninitiated, just exactly like e-discovery in 2016.

Later today [4PM EST / 3PM CST / 1PM PST], I’m doing another webcast, this one for Nuix, entitled, The Tipping Point of New Technology in Discovery.  The topic grows out of an essay posted here on October 19, 2016 wherein I addressed proportionality considerations when weighing the cost and accuracy of automated transcription and translation tools in e-discovery.  Put simply, for inexpensive technologies that displace manual processes, how inaccurate can such technologies be before the savings won’t defray failure?  I’ll be speaking from New Orleans, and the discussion will be led from Sydney by Nuix’ Angela Bunting.  I’m joined on the panel by Judge Xavier Rodriguez (USDC WDTX) in San Antonio and Scott Cohen of Winston & Strawn in New York.  This promises to be a lively talk!  Please stop by.

There’s a lot of really good content coming your way for free TODAY. Don’t miss it.

Happy E-Discovery Day to You and Yours!

Introduction to Discovery in U.S. Civil Litigation

tools of discoveryI am fortunate to teach electronic discovery and digital evidence in many venues. There’s the semester-long, 3 credit course at the University of Texas School of Law each Fall, the weeklong Training Academy offered to all comers each June at Georgetown Law School (as part of a splendid faculty) and the 50-70 speeches a year that keep me idling at airports. Next month, I’m adding a sixteen week, eight-session online evening program through the District of  Columbia Bar, immodestly titled “Prime Time with Craig Ball.”

All of these entail accompanying written material, so there is a lot of research and writing for the various courses and presentations.  Some of my students aren’t lawyers or are law students with the barest theoretical understanding of discovery.  I’ve found it’s never safe to assume that students know the mechanisms of last-century civil discovery, let alone those of modern e-discovery.  Accordingly, I penned the following short introduction to discovery in U.S. civil litigation and offer it here in case you need something like it, especially if you’re also teaching this stuff.  [It’s copyrighted, but feel free to use it with attribution].

Though I have never known a time without discovery, I found it interesting to reflect on the fact that civil discovery is only about 20 years older than I am; Discovery is nearly a Baby Boomer!  On a scale of jurisprudential evolution, we’re both young punks.  Need some perspective?  The FRCP are exactly the same age as U.S. Supreme Court Justice Stephen Breyer, former Attorney General Janet Reno and Prof. Alan Dershowitz.  Continue reading