Introducing the EDRM E-Mail Duplicate Identification Specification and Message Identification Hash (MIH)

I’m proud to be the first to announce that the Electronic Discovery Reference Model (EDRM) has developed a specification for cross-platform identification of duplicate email messages, allowing for ready detection of duplicate messages that waste review time and increase cost. Leading e-discovery service and software providers support the new specification, making it possible for lawyers to improve discovery efficiency by a simple addition to requests for production. If that sounds too good to be true, read on and learn why and how it works.

THE PROBLEM

The triumph of information technology is the ease with which anyone can copy, retrieve and disseminate electronically stored information. Yet, for email in litigation and investigations, that blessing comes with the curse of massive replication, obliging document reviewers to assess and re-assess nearly identical messages for relevance and privilege. Duplicate messages waste time and money and carry a risk of inconsistent characterization. Seeing the same thing over-and-over again makes a tedious task harder.

Electronic discovery service providers and software tools ameliorate these costs, burdens and risks using algorithms to calculate hash values—essentially digital fingerprints—of segments of email messages, comparing those hash values to flag duplicates. Hash deduplication works well, but stumbles when minor variations prompt inconsistent outcomes for messages reviewers regard as being “the same.” Hash deduplication fails altogether when messages are exchanged in forms other than those native to email communications—a common practice in U.S. electronic discovery where efficient electronic forms are often printed to static page images.

Without the capability to hash identical segments of identical formats across different software platforms, reviewers cannot easily identify duplicates or readily determine what’s new versus what’s been seen before. When identical messages are processed by different tools and vendors or produced in different forms (so-called “cross-platform productions”), identification of duplicate messages becomes an error-prone, manual process or requires reprocessing of all documents.

Astonishingly, no cross-platform method of duplicate identification has emerged despite decades spent producing email in discovery and billions of dollars burned by reviewing duplicates.

Wouldn’t it be great if there was a solution to this delay, expense and tedium?

THE SOLUTION

When parties produce email in discovery and investigations, it’s customary to supply information about the messages called “metadata” in accompanying “load files.” Load files convey Bates numbers/Document IDs, message dates, sender, recipients and the like. Ideally, the composition of load files is specified in a well-crafted request for production or production protocol. Producing metadata is a practice that’s evolved over time to prompt little argument. For service providers, producing one more field of metadata is trivial, rarely requiring more effort than simply ticking a box.

The EDRM has crafted a new load file field called the EDRM Message Identification Hash (MIH), described in the EDRM Email Duplicate Identification Specification.

Gaining the benefit of the EDRM Email Duplicate Identification Specification is as simple as requesting that load files contain an EDRM Message Identification Hash (MIH) for each email message produced. The EDRM Email Duplicate Identification Specification is an open specification, so no fees or permissions are required to use it, and leading e-discovery service and software providers already support the new specification. For others, it’s simple to generate the MIH without redesigning software or impeding workflows. Too, the EDRM has made free tools available supporting the specification.

Any party with the MIH of an email message can readily determine if a copy of the message exists in their collection. Armed with MIH values for emails, parties can flag duplicates even when those duplicates take different forms, enabling native message formats to be compared to productions supplied as TIFF or PDF images.

The routine production of the MIH supports duplicate identification across platforms and parties. By requesting the EDRM MIH, parties receiving rolling or supplemental productions will know if they’ve received a message before, allowing reviewers to dedicate resources to new and unique evidence. Email messages produced by different parties in different forms using different service providers can be compared to instantly surface or suppress duplicates. Cross-platform email duplicate identification means that email productions can be compared across matters, too. Parties receiving production can easily tell if the same message was or was not produced in other cases. Cross-platform support also permits a cross-border ability to assess whether a message is a duplicate without the need to share personally-identifiable information restricted from dissemination by privacy laws.

IS THIS REALLY NEW?

Yes, and unprecedented. As noted, e-discovery service providers and law firm or corporate e-discovery teams have long employed cryptographic hashing internally to identify duplicate messages; but each does so differently dependent upon the process and software platform employed—sometimes in ways they regard as being proprietary—making it infeasible to compare hash values across providers and platforms. Even if competitors could agree to employ a common method, subtle differences in the way each process and normalize messages would defeat cross-platform comparison.

The EDRM Email Duplicate Identification Specification doesn’t require software platform and service providers to depart from the proprietary ways they deduplicate email. Instead, the Specification contemplates that e-discovery software providers add the ability to produce the EDRM MIH to their platform and that service providers supply a simple-to-determine Message Identification Hash (MIH) value that sidesteps the challenges just described by taking advantage of an underutilized feature of email communication standards called the “Message ID” and pairing it with the power of hash deduplication. If it sounds simple, it is–and by design. It’s far less complex than traditional approaches but sacrifices little or no effectiveness or utility. Crucially, it doesn’t require any difficult or expensive departure from the way parties engage in discovery and production of email messages.

WHAT SHOULD YOU DO TO BENEFIT?

All you need to do to begin reaping the benefits of cross-platform message duplicate identification is amend your Requests for Production to include the EDRM Message Identification Hash (MIH) among the metadata values routinely produced as load files. As a prominently published specification by the leading standards organization in e-discovery, it’s likely the producing party’s service provider or litigation support staff know what’s required. But if not, you can refer them to the EDRM Email Duplicate Identification Specification & Guidelines published at https://edrm.net/active-projects/dupeid/.

HOW DO YOU LEARN MORE?

The EDRM publishes a comprehensive set of resources describing and supporting the Specification & Guidelines that can be found at https://edrm.net/active-projects/dupeid/. All persons and firms deploying the EDRM MIH to identify duplicate messages should familiarize themselves with the considerations for its use.

EDRM WANTS YOUR FEEDBACK

The EDRM welcomes any feedback you may have on this new method of identifying cross platform email duplicates or on any of the resources provided. We are interested in further ideas you may have and expect the use of the EDRM MIH to evolve over time. You can post any feedback or questions at https://edrm.net/active-projects/dupeid/.

ChatGPT Proves a Mediocre Law Student

I recently spent a morning testing ChatGPT’s abilities by giving it exercises and quizzes designed for my law and computer science graduate students. Overall, I was impressed with its performance, but also noticed that it’s frequently wrong but never in doubt: a mechanical mansplainer!

If you’re asking, “What is ChatGPT,” I’ll let it explain itself: “ChatGPT is a large language model developed by OpenAI. It is a type of machine learning model called a transformer, which is trained to generate text based on a given prompt. It is particularly well-suited to tasks such as natural language processing, text generation, and language translation. It is capable of understanding human language and generating human-like text, which makes it useful for a wide range of applications, such as chatbots, question-answering systems, and text completion.” I’d add that it’s the first artificial intelligence tool that anyone can play with, and it’s generating enormous excitement.

In terms of its legal knowledge, ChatGPT performed like a mediocre law student, performing well on essay questions and drafting exercises, but struggling with multiple choice questions. I couldn’t assign it a definitive score on the 25-question quiz as some questions required image analysis and I didn’t see a way to include illustrations in the chat interface. However, it performed poorly enough on the quiz for me to conclude that a passing score of 70 out of 100 points was out of reach.

One example of a question it struggled with was:

Which of the following is NOT an essential metadata value (a “metadata must”) to preserve and produce for a file?

  1. Custodian
  2. File Name
  3. Last Modified Date and Time
  4. Last Accessed Date and Time

The correct answer is Last Accessed Date and Time.  Class readings underscore the unreliability of Last Accessed dates and times and I supply a table of “Metadata Musts” for preservation and production.  Presumably trained on other sources, ChatGPT did not seem to understand the importance of a proper chain of custody for electronic evidence and replied:

Custodian is not considered an essential metadata value to preserve and produce for a file. The other options (File Name, Last Modified Date and Time, Last Accessed Date and Time) are all considered essential metadata values that should be preserved and produced for a file in order to provide important information about the file and its history.

Since it wasn’t performing well on my quiz, I asked,

Can you suggest some multiple choice quiz questions for students studying the fundamentals of electronic discovery and electronic data storage?

The first suggestion was:

What is the process of collecting, processing, and producing electronically stored information (ESI) in response to a legal request called?

a. Electronic discovery

b. Digital forensics

c. E-discovery

d. Data retrieval

In ChatGPT’s formulation “Electronic discovery” is the right response and “E-discovery” is wrong! 

Another suggestion:

What is the most common format for ESI production?

a. PDF

b. TIFF

c. Native file format

d. ASCII

That’s a challenging question because, as much as I’ve focused on and written about forms of production, I’ve not come across metrics establishing the most common format.  In supplying an answer key, ChatGPT identified Native file format as the correct answer, though I would have guessed “TIFF.”  I certainly wouldn’t have included ASCII as a choice.

Here are a few other quiz questions suggested by ChatGPT.  I’ve bolded what ChatGPT defined as the correct response and, in each instance, I’d conclude it’s either not the best answer or the question is so poorly constructed as to defy characterizing any single answer as the right one:

What is the process of identifying and eliminating irrelevant or privileged ESI called?

a. Culling

b. Filtering

c. Redaction

d. Processing

What is the process of reviewing ESI to identify relevant information called?

a. Review

b. Analysis

c. Coding

d. Searching

What is the process of converting ESI from its original format to a reviewable format called?

a. Conversion

b. Processing

c. Extraction

d. Production

What is the process of creating a backup copy of ESI for preservation purposes called?

a. Backup

b. Imaging

c. Archiving

d. Duplication

What is the process of destroying or disposing of ESI after it is no longer needed called?

a. Deletion

b. Erasure

c. Destruction

d. Purging

Encouraged that the law faculty might be safe for another semester, I turned ChatGPT to a drafting exercise:

You are to develop the best checklist of potential data sources that are candidates for legal preservation in any engagement—good enough that you’ll want to keep it to use in your own work. It does NOT have to be entirely original, and I encourage you to start with the best existing checklist(s) you can find and then make updates and improvements. Certainly, you should seek to add missing sources, but you may also see a better way to organize and present the content. Remember that this checklist is a list of sources and varieties not methods and, crucially, it should serve to enhance the completeness of the personal data map you will create (though it should NOT be limited to the scope of your personal digital footprint alone). Thus, generic checklist items like “Online sources” or “Local storage” won’t be sufficiently granular. Be sure to credit your sources and format it as a checklist.

ChatGPT responded:

Data Preservation Checklist

  1. Social media accounts (Facebook, Twitter, Instagram, LinkedIn, etc.)
  2. Email accounts (Gmail, Outlook, Yahoo, etc.)
  3. Cloud storage accounts (Google Drive, Dropbox, OneDrive, etc.)
  4. Mobile device data (text messages, call logs, contacts, etc.)
  5. Computer files (documents, photos, videos, etc.)
  6. Internet browsing history and cache
  7. Online marketplace accounts (eBay, Amazon, Etsy, etc.)
  8. Online banking and financial accounts (Paypal, bank accounts, credit cards, etc.)
  9. GPS location data
  10. Smart home device data (Amazon Echo, Google Home, etc.)
  11. Wearable device data (Fitbit, Apple Watch, etc.)
  12. Gaming accounts (Xbox Live, PlayStation Network, Steam, etc.)
  13. Voicemail and answering machine messages
  14. Instant messaging and chat logs (WhatsApp, Messenger, etc.)
  15. Social media direct messages and private messages
  16. Online dating profiles and messages
  17. Online forum and message board posts
  18. Online video and audio recordings (YouTube, Vimeo, etc.)
  19. Online photo and image sharing accounts (Flickr, Imgur, etc.)
  20. Online document collaboration and sharing accounts (Google Docs, Microsoft Office 365, etc.)

Source: Adapted from “Digital Preservation Checklist” by the National Archives and Records Administration (NARA) (https://www.archives.gov/digital-preservation/checklist) and “Digital Preservation Checklist for Individuals” by the Digital Preservation Coalition (https://www.dpconline.org/advice/digital-preservation-checklist-for-individuals)

Not bad, and certainly typical of what I get from students.  Interestingly, both citations are dead links, so unacceptable from my students, and (arguably) ChatGPT simply ignored the requirement to format its answer as a checklist. 

I had ChatGPT answer the question a second time sans the mandate that it credit sources. I was struck by the differences in the lists supplied and the disclaimer now appended:

  1. Social media accounts (e.g. Facebook, Twitter, Instagram, LinkedIn)
  2. Email accounts (e.g. Gmail, Yahoo, Outlook)
  3. Cloud storage accounts (e.g. Google Drive, Dropbox, iCloud)
  4. Online shopping accounts (e.g. Amazon, eBay, Etsy)
  5. Financial accounts (e.g. bank accounts, credit card accounts, PayPal)
  6. Mobile phone data (e.g. call logs, text messages, contacts)
  7. Computer hard drives
  8. USB drives and external hard drives
  9. Social media messaging apps (e.g. WhatsApp, iMessage, Messenger)
  10. Online forums and discussion boards
  11. Online dating profiles
  12. Online gaming accounts
  13. Web browsing history and cookies
  14. Internet service provider records
  15. GPS data (e.g. location data from mobile devices)
  16. Smart home device data (e.g. Amazon Echo, Google Home)
  17. Vehicle tracking data (e.g. OnStar, Tesla)
  18. Medical records (e.g. electronic health records)
  19. Educational records (e.g. transcripts, grades)
  20. Employee records (e.g. HR files, payroll records)

Note: This checklist is not exhaustive and may vary depending on the specific engagement or case. It is important to consult with legal and technical experts to determine the most relevant and appropriate data sources for preservation. Additionally, it is important to consider any privacy laws and regulations when collecting and preserving personal data.

I asked ChatGPT to identify “the most important legal opinions I should assign my students to read that will help them better understand electronic discovery in the United States” and it responded:

  1. Zubulake v. UBS Warburg LLC, 217 F.R.D. 309 (S.D.N.Y. 2003) – This case established the “Zubulake standards,” which are widely used as a guide for electronic discovery in the United States. It lays out the obligations of parties to preserve and produce electronically stored information (ESI) and the sanctions that can be imposed for failure to comply.
  2. Victor Stanley, Inc. v. Creative Pipe, Inc., 250 F.R.D. 251 (D. Md. 2008) – This case discusses the duty to preserve ESI and the consequences of spoliation (destruction or alteration of evidence). It also addresses the standard for imposing sanctions for spoliation.
  3. The Sedona Conference® Commentary on Achieving Quality in the E-Discovery Process, The Sedona Conference (2008) – This commentary provides guidance on best practices for e-discovery, including the preservation and production of ESI, the use of technology-assisted review (TAR) and predictive coding, and the handling of privileged information.
  4. Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182 (S.D.N.Y. 2012) – This case addresses the use of TAR and predictive coding in e-discovery, and emphasizes the importance of cooperation and communication between parties when using these technologies.
  5. Morgan Stanley Smith Barney LLC v. Skowron, No. 11-CV-8335 (S.D.N.Y. Mar. 29, 2013) – This case discusses the use of proportionality in e-discovery, and the court’s authority to limit the scope of e-discovery in order to reduce the burden and expense of producing ESI.
  6. In re: Actos (Pioglitazone) Products Liability Litigation, MDL No. 6:11-md-2299 (W.D. La. 2013) – This case addresses the use of advanced analytics and data visualization tools in e-discovery, and highlights the importance of considering the costs and benefits of using these technologies in the e-discovery process.

Except for mischaracterizing a Sedona Conference Commentary as a “legal opinion,” they’re all fine suggestions, yet all are ten-twenty years old. 

I asked ChatGPT a question posed by a student in my last class: “What is the difference between electronic discovery and computer forensics?”  It opined:

Electronic discovery (e-discovery) and computer forensics are both related to the collection and analysis of electronic data, but they serve different purposes.

Electronic discovery is the process of collecting, processing, and producing electronic data in response to a legal request or investigation. It is typically used in civil litigation and is designed to identify and produce relevant electronic evidence that may be used in court. The goal of e-discovery is to find information that is responsive to the legal request and that is not privileged or protected by any other legal exceptions.

Computer forensics, on the other hand, is the process of collecting, preserving, analyzing, and presenting electronic evidence in a manner that is admissible in court. It is typically used in criminal investigations and is designed to identify, recover, and preserve evidence of a crime or other misconduct. The goal of computer forensics is to find and preserve evidence that may be used to prove or disprove a crime or other wrongdoing.

In summary, electronic discovery is focused on finding and producing relevant electronic evidence in response to a legal request, while computer forensics is focused on finding and preserving evidence of a crime or other misconduct. Both processes require the use of specialized tools and techniques to collect and analyze electronic data, but they serve different purposes and are used in different types of legal proceedings.

A good answer.  My response in class focused on the relative accessibility and intelligibility of the ESI we deal with in e-discovery versus digital forensics, and I didn’t tie forensics to criminal investigations because so much of my work in the field has concentrated on civil cases.  But I can’t say I did any better than the AI.  I smiled more.

Finally, I asked who are the most important thought leaders in the field of electronic discovery?  ChatGPT replied:

  1. Ralph Losey: A well-known attorney and author who has written extensively on the topic of electronic discovery and has been recognized as a leading expert in the field.
  2. Craig Ball: A forensic expert and attorney who has played a key role in shaping the laws and practices related to electronic discovery through his work as a consultant, expert witness, and author.
  3. Tom O’Connor: A consultant, speaker, and author who is considered one of the leading experts on the use of technology in the legal field and has been instrumental in developing best practices for electronic discovery.
  4. George Socha: A consultant, speaker, and author who is considered a leading expert in the field of electronic discovery and has been instrumental in the development of industry standards and best practices.
  5. Maura Grossman: A consultant, speaker, and author who is considered one of the leading experts on the use of artificial intelligence and machine learning in electronic discovery and has been instrumental in advancing the field.
  6. E-Discovery Team: A team of experts who have written extensively on the topic of electronic discovery and have been recognized as thought leaders in the field through their work as consultants, speakers, and authors.
  7. Paul J. Lippe: A consultant, speaker, and author who is considered a leading expert in the field of electronic discovery and has been instrumental in the development of industry standards and best practices.
  8. John Simek: A consultant, speaker, and author who is considered one of the leading experts on the use of technology in the legal field and has been instrumental in developing best practices for electronic discovery.
  9. David Horrigan: A consultant, speaker, and author who is considered one of the leading experts on the use of technology in the legal field and has been instrumental in developing best practices for electronic discovery.
  10. Daniel Garrie: A consultant, speaker, and author who is considered a leading expert in the field of electronic discovery and has been instrumental in the development of industry standards and best practices.

Kudos to Ralph Losey for topping the list and garnering a second mention as “E-Discovery Team.”  I wouldn’t have considered Paul J. Lippe a candidate for an e-discovery-specific list, and I can’t very well commend ChatGPT for wokeness.  One woman?  No persons of color?  No jurists?  I’m flattered to be included among the other old, white men, but know well that there are plenty as or more deserving.

Not So Fine Principle Nine

For the second class meeting of my law school courses on E-Discovery and Digital Evidence, I require my students read the fourteen Sedona Conference Principles from the latest edition of “Best Practices, Recommendations & Principles for Addressing Electronic Document Production.” The Sedona principles are the bedrock of that group’s work on ESI and, notwithstanding my misgivings that the Principles have tilted toward blocking discovery more than guiding it, there’s much to commend in each of the three versions of the Principles released over the last twenty years.  They enjoy a constitutional durability in the eDiscovery community.

When my students read the Principles, I revisit them and each time, something jumps out at me.  This semester, it’s the musty language of Principle 9:

Principle 9: Absent a showing of special need and relevance, a responding party should not be required to preserve, review, or produce deleted, shadowed, fragmented, or residual electronically stored information.

The Sedona Principles, Third Edition: Best Practices, Recommendations & Principles for Addressing Electronic Document Production, 19 SEDONA CONF. J. (2018)

Save for the substitution of “electronically stored information” for the former “data or documents,” Principle 9 hasn’t been touched since its first drafts of 20+ years ago.  One could argue its longevity owes to an abiding wisdom and clarity. Indeed, the goals behind P9 are laudable and sound.  But the language troubles me, particularly the terms, “shadowed” and “fragmented,” which someone must have pulled out of their … I’ll say “hat” … during the Bush administration, and presumably no one said, “Wait, is that really a thing?”  In the ensuing decades, did no one question the wording or endeavor to fix it?

My objection is that both are terms of art used artlessly.  Consider “shadowed” ESI.  Run a search for shadowed ESI or data, and you’ll not hit anything on point but the Principle itself.  Examine the comments to Principle 9 and discover there’s no effort to explain or define shadowed ESI.  Head over to The Sedona Conference Glossary: eDiscovery and Digital Information Management, and you’ll find nary a mention of “shadowed” anything. 

That is not to say that there wasn’t a far-behind-the-scenes service existing in Microsoft Windows XP and Windows Server to facilitate access to locked files during backup that came to be called “Volume Shadow Copy Services” or “VSS,” but it wasn’t being used for forensics when the language of Principle 9 was floated.  I was a forensic examiner at the time and can assure you that my colleagues and I didn’t speak of “shadowed” data or documents.

But whether an argument can be made that it was a “thing” or not twenty years ago, it’s never been a term in common use, nor one broadly understood by lawyers and judges.  It’s not defined in the Principles or glossaries.  You’ll get no useful guidance from Google. 

What harm has it done?  None I can point to.  What good has it done?  None.  Yet, it might be time to consign “shadowed” to the dustbin of history and find something less vague.  It’s not gospel, it’s gobbledygook.

“Fragmented” is a term that’s long been used in reference to data storage, but not as a synonym for “residual” or “artifact.”  Fragmented files refer to information stored in non-contiguous clusters on a storage medium.  Many of the files we access and know to be readily accessible are fragmented in this fashion, and no one who understands the term in the context of ESI would confuse “fragmented” data or documents with something burdensome to retrieve.  But don’t take my word for that, Sedona’s own glossary backs me up.  Sedona’s Principle 9 doesn’t use “fragmented” as Sedona defines it.

If the drafters meant “fragments of data,” intending to convey “artifacts recoverable through computer forensics but not readily accessible to or comprehended by users,” then perhaps other words are needed, though I can’t imagine what those words would add that “deleted” or “residual” doesn’t cover.

This is small potatoes. No one need lose a wink of sleep over the sloppy wording, and I’m not the William Safire of e-discovery or digital forensics; but words matter.  When you are writing to guide persons without deep knowledge of the subject matter, your words matter very much.  If you use a term of art, make sure it’s a correct usage, a genuine one; and be certain you’ve either used it as experts do or define the anomalous usage in context.

When I fail to do that, Dear Reader, I hope you’ll call me on it, too.

The Annotated ESI Protocol

Tags

Periodically, I strive to pen something practical and compendious on electronic evidence and eDiscovery, drilling into a topic, that hasn’t seen prior comprehensive treatment.  I’ve done primers on metadata, forms of production, backup systems, databases, computer forensics, preservation letters, ESI processing, email, digital storage and more, all geared to a Luddite lawyer audience.  I’ve long wanted to write, “The Annotated ESI Protocol.” Finally, it’s done.

The notion behind the The Annotated ESI Protocol goes back 40 years when, as a fledgling personal injury lawyer, I found a book of annotated insurance policies.  What a prize!  Any plaintiff’s lawyer will tell you that success is about more than liability, causation and damages; you’ve got to establish coverage to get paid.  Those annotated insurance policies were worth their weight in gold.

As an homage to that treasured resource, I’ve sought to boil down decades of ESI protocols to a representative iteration and annotate the clauses, explaining the “why” and “how” of each.  I’ve yet to come across a perfect ESI protocol, and I don’t kid myself that I’ve crafted one.  My goal is to offer lawyers who are neither tech-savvy nor e-discovery aficionados a practical, contextual breakdown of a basic ESI protocol–more than simply a form to deploy blindly or an abstract discussion.  I’ve seen thirty-thousand-foot discussions of protocols by other commentators, yet none tied to the document or served up with an ESI protocol anyone can understand and accept. 

It pains me to supply the option of a static image (“TIFF+”) production, but battleships turn slowly, and persuading lawyers long wedded to wasteful ways that they should embrace native production is a tough row to hoe. My intent is that the TIFF+ option in the example sands off the roughest edges of those execrable images; so, if parties aren’t ready to do things the best way, at least we can help them do better.

Fingers crossed you’ll like The Annotated ESI Protocol and put it to work. Your comments here are always valued.

Seven Stages of Snakebitten Search

I’ve long been fascinated by electronic search.  I especially love delving into the arcane limitations of lexical search because, awful Grinch that I am, I get a kick out of explaining to lawyers why their hard-fought search queries and protocols are doomed to fail. But, once we work through the Seven Stages of Attorney E-Discovery Grief: Umbrage, Denial, Anger, Angry Denial, Fear, Finger Pointing, Threats and Acceptance, there’s almost always a workaround to get the job done with minimal wailing and gnashing of teeth.

Three consults today afforded three chances to chew over problematic search strategies: 

  • First, the ask was to search for old CAD/CAM drawings in situ on an opponent’s file servers based on words appearing on drawings. 
  • Another lawyer sought to run queries in M365 seeking responsive text in huge attachments.
  • The last lawyer wanted me to search the contents of a third-party’s laptop for subpoenaed documents but without the machine being imaged or its contents processed before search.

Most of my readers are e-discovery professionals so they’ll immediately snap to the reasons why each request is unlikely to work as planned. Before I delve into my concerns, let’s observe that all these requests seemed perfectly reasonable in the minds of the lawyers involved, and why not?  Isn’t that how keyword and Boolean search is supposed to work?  Sadly, our search reach often exceeds our grasp.

Have you got your answers to why they may fail?  Let’s compare notes.

  • When it comes to lexical search, CAD/CAM drawings differ markedly from Word documents and spreadsheets.  Word processed documents and spreadsheets contain text encoded as ASCII or Unicode characters.  That is, text is stored as, um, text.  In contrast, CAD/CAM drawings tend to be vector graphics.  They store instructions describing how to draw the contents of the plans geometrically; essentially how the annotations look rather than what they say. So, the text is an illustration of text, much like a JPG photograph of a road sign or a static TIFF image of a document—both inherently unsearchable for text unless paired with extracted or OCR text in ancillary load files.  Bottom line: Unless the CAD/CAM drawings are subjected to effective optical character recognition before being indexed for search, lexical searches won’t “see” any text on the face of the drawings and will fail.
  • M365 has a host of limits when it comes to indexing Cloud content for search, and of course, if it’s not in the index, it won’t turn up in response to search.  For example, M365 won’t parse and index an email attachment larger than 150MB.  Mind you, few attachments will run afoul of that capacious limit, but some will.  Similarly, M365 will only parse and index the first 2 million characters of any document.  That means only the first 600-1,000 pages of a document will be indexed and searchable.  Here again, that will suffice for the ordinary, but may prove untenable in matters involving long documents and data compilations.  There are other limits on, e.g., how deeply a search will recurse through nested- and embedded content and the body text size of a message that will index.  You can find a list of limits here (https://learn.microsoft.com/en-us/microsoft-365/compliance/limits-for-content-search?view=o365-worldwide#indexing-limits-for-email-messages) and a discussion of so-called “partially indexed” files here (https://learn.microsoft.com/en-us/microsoft-365/compliance/partially-indexed-items-in-content-search?view=o365-worldwide).  Remember, all sorts of file types aren’t parsed or indexed at all in M365.  You must tailor lexical search to the data under scrutiny.  It’s part of counsels’ duty of competence to know what their search tools can and cannot do when negotiating search protocols and responding to discovery using lexical search.
  • In their native environments, many documents sought in discovery live inside various container files ranging from e-mail and attachments in PST and OST mail containers to compressed Zip containers.  Encrypted files may  be thought of as being sealed inside an impenetrable container that won’t be searched.  The upshot is that much data on a laptop or desktop machine cannot be thoroughly searched by keywords and queries by simply running searches within an operating system environment (e.g., in Windows or MacOS).   Accordingly, forensic examiners and e-discovery service providers collect and “process” data to make it amenable to search.  Moreover, serial search of a computer’s hard drive (versus search of an index) is painfully slow, so unreasonably expensive when charged by the hour.  For more about processing ESI in discovery, here’s my 2019 primer (http://www.craigball.com/Ball_Processing_2019.pdf)

In case I don’t post before Chanukah, Christmas and the New Year, have a safe and joyous holiday!

Don’t Seek Direct Access to Opponents’ Devices

In a case where I’d identified evidence of a departing employee’s data theft, plaintiff’s counsel sought an affidavit in support of a motion to gain direct access to the new employer’s data storage to see how the stolen data was distributed and used.  I replied that I could supply the testimony but offered that the wiser strategy was not to move for direct access but instead seek an agreement or order that the other side’s forensic expert hew to an agreed-upon examination protocol.  That would afford opposing counsel a proper opportunity to withhold and log content deemed privileged or otherwise outside the scope of discovery.

I’ve worked both sides–and the middle–of countless so-called “bad leaver” cases, where employees are accused of taking data from one employer to another to secure a better job or a competitive advantage.  When I’m in the middle, I’m a court-appointed neutral examiner and, in that trusted role, it’s appropriate that I see the whole picture by looking at all implicated devices and accounts: the sources from which data was taken, the transfer media and the target devices and accounts where stolen data and its progeny reside. As a neutral examiner and attorney, I’m well-situated to balance the need to know what happened against the need to guard against improper or abusive discovery.

But when I’m acting for the party suing a competitor alleging data theft, viz., as a partisan expert, the party suspected of benefitting from the theft must be protected from unduly intrusive access to their devices.  Both sides have trade secrets requiring protection and both engage in privileged communications with counsel.  As well, material encountered on examination often has no relevance yet hurts and shames just by being divulged.

As a partisan working for one side or the other, I’d like to be able to say, “trust me, I’ll be bulwark against revealing your privileged communications and irrelevant stuff,” but that’s not a role I covet absent considerable trust and an express agreement.  Trying to be partisan and neutral fosters divided loyalties, and as an attorney, I’m obliged to avoid conflict of interest or anything that looks like it.  Lay examiners should, too.

Trust is devoutly to be wished in these situations…but it’s hard to trust those you believe to be thieves. Justified or not, that mistrust frequently extends to those protecting the thieves, i.e., their counsel.  So, in the absence of trust in people, the law trusts in sound and transparent processes.  In these matters, a well-crafted forensic examination protocol ensures that the right evidence is scrutinized in the right ways, and material legitimately withheld is protected. 

By setting out what devices and sources need to be examined, what artifacts must be assessed and reported upon and how much oversight and transparency is allowed, the opposing expert serves as proxy for my hands and eyes.  Keyword searches and hash matching alone don’t cut it; a good examination protocol encompasses the singular signs of data theft and makes it difficult to suppress indicia of bad behavior.  A good protocol makes production of evidence and artifacts the rule unless there’s legal justification to withhold relevant and responsive evidence—and then there must be disclosure via a privilege log or other means.  I discuss drafting forensic examination protocols further in this article.  The points made in the article are just a start: proper protocols are tailored to the issues and evidence in the case, and constructed to promote integrity of process.

Can all this be abused?  Sure.  But, effective e-discovery is a marathon, not a sprint.  Certainly, this is true of efforts to seek sanctions.  If the process just described can be proven to have been gamed or corrupted, judges respond aggressively to protect the integrity of their courts, among those responses the turnover of an opponent’s devices, that is, the dread direct examination.

So, the takeaway: though sometimes direct access can be secured by agreement, don’t jump straight to a motion to compel access to an opponent’s computers and data storage when the other side says “no.”  Instead, pursue alternatives that fairly balance legitimate needs for disclosure against legitimate needs to protect trade secrets, privacy and privilege.  Do this using a well-crafted forensic examination protocol that obliges the other side to engage competent people, deploy them competently and afford reasonable transparency. The other side remains responsible for discovery until it’s clear they can’t be trusted.  Then, follow up by asking the Court to appoint a properly trained and -certified neutral examiner.  Seeking to compel access to an opponent’s digital media is a last resort and should be treated as an extraordinary remedy—a punishment for discovery abuse more than a tool of discovery.

Labels: Not Just for People Anymore!

I’m a tool guy.  I pride myself on having the right tool for the task at hand. Digital forensics demands a broad range of specialized software, adapters, cabling, screwdrivers and spudgers.  Yes, “spudger” is a real word.  The forensic examiners I know enjoy swapping tool recommendations because often having the right tool to collect or parse electronic evidence means the difference between a quick victory or an agonizing series of defeats.

Delving deeper into the Alex Jones Discovery Debacle in a Law.com article bylined by Emily Cousins, I saw mention of Jones’ counsel passing around a “white hard drive” bearing no label or markings, nothing whatsoever advising the drive held privileged attorney-client records and court-protected documents.

After twenty-odd years as a certified digital forensic examiner and ESI Special Master, I’m custodian for countless hard drives and storage media holding sensitive data—media in all colors, shapes and sizes.  One common thread across all is the adhesive paper label affixed to each.  Arguably, the most valuable tool in my forensics lab is an 18-year-old Brother QL-500 label printer purchased secondhand from a clearance bin at Office Depot.  The 1-1/7″ x 3-1/2″ labels it spits out aren’t as slick as the Mylar bar codes used in other shops, but they’re a quick, cheap and nearly idiot proof alternative.  Leastwise, this idiot would be lost without them.

You’d be amazed how much information fits on a little paper label; but, even if it’s no more than a matter name and “PRIVILEGED AND CONFIDENTIAL – SUBJECT TO PROTECTIVE ORDER,” that might have proved sufficient to forestall the ugly spectacle of Jones’ Connecticut counsel invoking the Fifth Amendment at a disciplinary hearing.

In a similar vein, a five-cent paper label affixed to the back of a laptop or phone is an effective, low-tech way to remind custodians and IT personnel that the device is subject to legal hold before it gets wiped, discarded, sold or traded in.  Is it ALL you need to do for a defensible hold?  Clearly not, but shouldn’t labeling physical media be as much a part of your routine, prudent processes when handling sensitive media as it is part of mine?

I don’t do paper; but my paperless practice demands I keep track of and protect physical media.  Encryption of contents plays a key role, as does a sound chain-of-custody; yet it’s the old-school paper labels that still save the day. 

Brother no longer makes the QL-550, but sells suitable alternatives.  Dymo markets a LabelWriter 550 that looks like my Brother’s twin brother.

Clarify Requests for Native ESI

Poring over Requests for Production this morning, I was gratified to see the client sought native forms of electronically-stored information; but the request said only, “All documents shall be Bates stamped and provided in native format.”  Is that sufficient? To me, specifying forms of production is best done via an agreed ESI production protocol, but failing that, requesting parties should supply more detail than simply asking for “native format.”  I believe requests need to lay out the forms sought for particularized types of ESI and specify the essential ancillary metadata to be produced in load files. 

Requesting native forms in discovery demands a few adaptations versus the way hard copy documents were sought in years past.  Take that request, “All documents shall be Bates stamped and provided in native format.”  If a document is supplied natively and not printed out or “flattened” to a static TIFF, where do you “stamp” the Bates number?  The solution is simple (in the file name and load file), but not obvious to lawyers unschooled in e-discovery.

Specifying more than “native format” in the request is sensible because much ESI doesn’t lend itself to production in its “true” native forms.  The “true” native form of email is typically a database of multiple user accounts holding messages, calendars, contacts, to-do lists, etc.  An opponent need not (and won’t) produce such a massive, undifferentiated blob of data.  So the better practice is to specify preferred near-native forms be produced; that is, forms that preserve the integrity and utility of the evidence and support the granularity needed for discovery of only relevant, non-privileged material.  As well, providing a load file specification ensures you obtain metadata values that only the producing party can supply (like Bates numbers, originating hash values, source paths and custodians). Too, you want that metadata in a structure suited to your needs and tools.

Native productions are more utile and cost-effective, but only to requesting parties prepared to reap their superior utility and savings.  One reason why producing parties have gotten away with producing inefficient and unsearchable static image formats (TIFFs) for so long is because TIFF images can be viewed in a browser; hence, recipients of TIFF productions can read documents page-by-page without review software.  Yet, that easy access comes at a perilous cost.  TIFF productions are many times larger in byte volume than native production of the same material, making it significantly more costly for requesting parties to ingest and host the evidence.  Moreover, TIFF images tend not to work well for common formats like spreadsheets and PowerPoint presentations, and don’t work at all for, e.g., video and sound files.  Finally, evidence produced as TIFF images gets shorn of metadata and searchable electronic content, requiring that the stripped metadata and searchable content be produced separately and reconstructed using software to comprise, at best, a degraded “TIFF Plus” facsimile of the evidence. 

For these reasons and more, requests for production must either succeed the entry of an agreed- or court-ordered production protocol or requesting parties must include useful and practical instructions about the forms of production right in the body of the Request.

To simplify my client’s task, I drafted an Appendix to be grafted onto the Requests for Production and suggested my client take out “All documents shall be Bates stamped and provided in native format” and substitute the phrase: “All production should be produced in accordance with the instructions contained in Appendix A to this Request.” It’s not perfect, but it should get the job done.

The Appendix I supplied reads as follows, and I don’t offer it as a paragon of legal draftsmanship.  Each time I create something like this, it’s a struggle deciding what details to omit versus supplying all features of a full-fledged production protocol.  I’ve kept it to about 1,000 words, and a tad verbose at that.  It’s for you to decide if it adds substantial value over simply asking for “native format.”  Tell me what do you think in the comments. If you’d like a Microsoft Word version of Appendix A to play with, you can download it from this link: http://craigball.com/Request_for_Native_Production-Appendix_A.docx

Appendix A: Forms of Production

I. Definitions

“Electronically Stored Information” or “ESI” includes communications, presentations, writings, drawings, graphs, charts, photographs, posts, video and sound recordings, images, and other data or data compilations existing in electronic form on any medium including, but not limited to: (i) e-mail, texting, social media or other means of electronic communications; (ii) word processing files (e.g., Microsoft Word); (iii) computer presentations (e.g., Microsoft PowerPoint); (iv) spreadsheets (e.g., Microsoft Excel); (v) database content and (vi) media files (e.g., jpg, wav).

“Metadata” means and refers to (i) structured (fielded) information embedded in a native file which describes the characteristics, origins, usage, and/or validity of the electronic file; (ii) information generated automatically by operation of a computer or other information technology system when a native file is created, modified, transmitted, deleted, or otherwise manipulated by a user of such system; (iii) information, such as Bates numbers, created during the course of processing documents or ESI for production; and (iv) information collected during the course of collecting documents or ESI, such as the name of the media device, or the custodian or non-custodial data source from which it was collected.

“Native Format” means and refers to the format of ESI in which it was generated and/or as used by the producing party in the usual course of its business and in its regularly conducted activities. For example, the native format of an Excel workbook is a .xls or .xslx file and the native format of a Microsoft Word document is a .doc or .docx file.

“Near-Native Format’ means and refers to a form of ESI production that preserves the functionality, searchability and integrity of a Native Format item when it is infeasible or unduly burdensome to produce the item in Native Format.  For example, an MBOX is a suitable near-native format for production of Gmail, an Excel spreadsheet is a suitable near-native format for production of Google Sheets, and EML and MSG files are suitable near-native formats for production of e-mail messages.  Static images are not near-native formats for production of any form except Hard Copy Documents.

II. Production

1. Responsive electronically stored information (ESI) shall be produced in its Native Format with Metadata.

2. If it is infeasible to produce an item of responsive ESI in its Native Format, it may be produced in a Near-Native Format with options for same set out in the table below:

Source ESINative or Near-Native Form or Forms Sought
Microsoft Word documents.DOC, .DOCX
Microsoft Excel Spreadsheets.XLS, .XLSX
Microsoft PowerPoint Presentations.PPT, .PPTX
Microsoft Access Databases.MDB, .ACCDB
WordPerfect documents.WPD
Adobe Acrobat Documents.PDF
Photographs.JPG, .PDF
E-mailMessages should be produced in a form or forms that readily support import into standard e-mail client programs; that is, the form of production should adhere to the conventions set out in RFC 5322 (the internet e-mail standard).   For Microsoft Exchange or Outlook messaging, .PST format will suffice.  Single message production formats like .MSG or .EML may be furnished, if source foldering data is preserved and produced.  If your workflow requires that attachments be extracted and produced separately from transmitting messages, attachments should be produced in their native forms with parent/child relationships to the message and container(s) preserved and produced in a delimited text file.
Social MediaSocial media content should be collected using industry standard practices incorporating reasonable methods of authentication, including but not limited to MD5 hash values.  Social media and webpages should be produced as HTML faithful to the content and appearance of the native source, or as JPG images with a searchable, document-level files containing textual content and delimited metadata (including “likes” and comments)

3. Paper (Hard-Copy) documents or items requiring redaction shall be produced in static image formats scanned at 300 dpi e.g., single-page Group IV.TIFF or multipage PDF images. If an item uses color to convey information and not merely for aesthetic reasons, the producing party shall not produce the item in a form that does not display color. The full content of each document will be extracted directly from the native source where feasible or, where infeasible, by optical character recognition (OCR) or other suitable method to a searchable text file produced with the corresponding page image(s) or embedded within the image file.  Redactions shall be logged along with other information items withheld on claims of privilege.

4. Each item produced shall be identified by naming the item to correspond to a Bates number according to the following protocol:

i. The first three (3) characters of the filename will reflect a unique alphanumeric designation identifying the party making production.

ii. The next eight (8) characters will be a unique, consecutive numeric value assigned to the item by the producing party. This value shall be padded with leading zeroes as needed to preserve its length.

iii. The final six (6) characters are reserved to a sequence consistently beginning with a dash (-) or underscore (_) followed by a five-digit number reflecting pagination of the item when printed to paper or converted to an image format for use in proceedings or when attached as exhibits to pleadings.

iv. This format of the Bates identifier must remain consistent across all productions. The number of digits in the numeric portion and characters in the alphanumeric portion of the identifier should not change in subsequent productions, nor should spaces, hyphens, or other separators be added or deleted except as set out above.

5. If a response to discovery requires production of discoverable electronic information contained in a database, you may produce standard reports; that is, reports that can be generated in the ordinary course of business and without specialized programming.  All such reports shall be produced in a delimited electronic format preserving field and record structures and names.  If the request cannot be fully answered by production of standard reports, Producing Party should advise the Requesting Party of same so the parties may meet and confer regarding further programmatic database productions.

III. Load Files

Producing party shall furnish a delimited load file in industry-standard Opticon and Concordance formats supplying the metadata field values listed below for each item produced (to the extent the values exist and as applicable):

FIELDDEFINITION
CUSTODIANName of person or source from which data was collected.  **Where redundant names occur, individuals should be distinguished by an initial which is kept constant throughout productions (e.g., Smith, John A. and Smith, John B.)
ALL_CUSTODIANS If deduplication employed, name(s) of any person(s) from whom the identical item was collected and deduplicated.
BEGBATESBeginning Bates Number (production number)
ENDBATESEnd Bates Number (production number)
BEGATTACHFirst Bates number of first attachment in family range
ENDATTACHLast Bates number of last attachment in family range (i.e. Bates number of the last page of the last attachment).
ATTACHCOUNTNumber of attachments to an e-mail.
ATTACHNAMESName of each individual attachment, separated by semi-colons.
PARENTBATESBEGBATES number for the parent email of a family (will not be populated for documents that are not part of a family)
ATTACHBATESBates number from the first page of each attachment
PGCOUNTNumber of pages in the document
FILENAMEOriginal filename at the point of collection, without extension of native file
FILEEXTENSIONFile extension of native file
FILESIZEFile Size
FILEPATHFile source path for all electronically collected documents and emails, which includes location, folder name, file name, and file source extension.
NATIVEFILELINKFor documents provided in native format only
TEXTPATHFile path for OCR or Extracted Text files
FROMSender
TORecipient
CCAdditional Recipients
BCCBlind Additional Recipients
SUBJECTSubject line of e-mail. 
DATESENT (mm/dd/yyyy hh:mm:ss AM)Date Sent
EMAILDATSORT (mm/dd/yyyy hh:mm:ss AM)Sent Date of the parent email (physically top email in a chain, i.e. immediate/direct parent email)
MSGIDEmail system identifier assigned by the host email system. 
IRTIDE-mail In-Reply-To ID assigned by the host e-mail system.
CONVERSATIONIDE-mail thread identifier.
HASHVALUEMD5 Hash Value of production item
TITLETitle provided by user within the document
AUTHORCreator of a document
DATECRTD (mm/dd/yyyy hh:mm:ss AM)Creation date
LASTMODD (mm/dd/yyyy hh:mm:ss AM)Last Modified Date

The chart above describes the metadata fields to be produced in generic, commonly used terms.   You should adapt these to the specific types of electronic files you are producing to the extent such metadata fields are exist in the original ESI and can be extracted as part of the electronic data discovery process. Any ambiguity about a metadata field should be discussed with the Requesting Party prior to processing and production.

More Questions re: Alex Jones Defamation Case

Yesterday, I posted on the inadvertent production of privileged texts and other matter in the Alex Jones defamation trial.  In the day that’s passed, the Austin, Texas jury returned a compensatory damages verdict of $4.1 million dollars, and minutes ago, assessed punitive damages of $45.2 million.  More has come to light overnight respecting the lawyers’ errors, confirming what I’d only been able to speculate about yesterday.  Now, I want to add a point about features of Texas law that very well could determine if there will be a mistrial or a new trial on appeal.

In yesterday’s post, I explored the Texas rules of procedure and evidence permitting a party who unwittingly produces privileged data to “snap back” that evidence by belatedly asserting its privileged character and demanding return (Tex. R. Civ. P. Rule 193.3 and Tex. R. Evid. Rule 511).  I also touched on counsel’s ethical duty to notify an opponent who has mistakenly supplied material relating to a lawyer’s representation of a client (ABA Model Rule of Professional Conduct 4.4b).

Concluding yesterday’s post, I posed questions:

  • Was a new link to a collection scrubbed of privileged content ever supplied?
  • Why didn’t defense counsel promptly object at trial and protect the record?  
  • Will we next need to discuss the crime/fraud exception to attorney-client privilege?

One of these now has an answer: Based on remarks of counsel at a hearing on Defendants’ Emergency Motion for Enforcement of Protective Order dated December 2, 2021 (filed August 4, 2022), the updated link promised was never supplied.  Further, no subsequent assertions of privilege or other action pursuant to Tex. R. Civ. P. Rule 193.3 were made by Defendants.  It appears the ball was dropped, prompting one to wonder whether defense counsel hoped he would wake up and find it was a bad dream (like–DATED REFERENCE WARNING–Pam seeing Bobby Ewing in the shower).  In the run up to a high-profile trial, it’s more likely it just slipped through the cracks.

Since defense counsel Andino Reynal clearly didn’t expect the text messages to emerge at trial, it remains a conundrum why defense counsel failed to object when they came up and were shown to the jury.  Take it from someone who spent decades trying cases in Texas courts and another decade teaching electronic evidence at Texas’ premier law school (whaddya mean “which one?“), when improper evidence that hurts your client is mentioned to a jury panel, let alone proffered for them to see, counsel must leap to his feet and assert a prompt, clear objection.  There are exceptions to this, but any trial lawyer worth his salt understands the duty to protect the record by timely objection.  Even if you’re loath to appear obstructionist by objecting, you nevertheless rise and say, “Objection.  May we approach, Your Honor?”  Then, quietly make your record at the bench.

This is rooky stuff: You protect the record by timely objection or waive grounds for that objection.

While I’m pointing fingers, I’m wondering why on Earth plaintiffs’ counsel Mark Bankston thought he could broach allegations of discovery misconduct in front of a Texas jury?!? 

This point hasn’t come up in any reporting I’ve seen but the Texas Supreme Court takes a dim view of litigants airing the dirty linen of discovery abuse to jury panels.  Texas’ approach is in marked contrast to Federal practice where judges let juries hear questions of spoliation. In Texas, as a matter of law, the trial judge determines whether spoliation has occurred and what sanctions to impose.  Evidence of spoliating conduct is inadmissible in Texas. Brookshire Bros., Ltd. v. Aldridge, 438 S.W.3d 9 (Tex. 2014)

An argument can be made that spoliation is one thing and telling the jury that opposing counsel “messed up” and supplied documents meant to stay hidden is something else.  Perhaps; but, I’ll bet my boots there was a pretrial order on a Motion in Limine barring mention of discovery disputes without first seeking leave of court.  Too, the Texas Supreme Court’s concern that juries may be prejudiced by discussion of matters reserved exclusively to the Court’s determination would seem to hold true for telling juries that opposing counsel “messed up.” 

There’s been a lot of digital ink devoted to lambasting defense counsel for his mistakes, but in his fervent (and understandable) eagerness to tag Alex Jones, it remains to be seen if Plaintiffs’ counsel overstepped and the whole damages phase must be retried. If so, UGH, just UGH!

P.S. I just noticed that plaintiffs’ counsel practices with a Houston firm with “Ball” in its name. That’s not me nor anyone in my family. No connection whatsoever.

Ripped from the Headlines: Alex Jones and Inadvertent Waiver

I reserve this space for topics I’ve mulled over carefully in hopes that, even if I’m late to the party, at least I’ll be properly dressed. But yesterday, the media covering the Alex Jones defamation damages trial in Austin lit up with the news that Jones’ counsel inadvertently produced privileged mobile text messages and failed to seek their return in time to prevent waiver of privilege. The “inadvertently” produced messages reveal that (SPOILER ALERT!) Jones is a discovery-obstructing, lying scumbag.

That’s not political commentary; that’s the key fact finding of the Court during the liability phase of the case (“scumbag” is synopsis, implicit but unstated by Her Honor).  Jones’ discovery misconduct was so egregious, it compelled the judge to enter a default judgment on liability.  Hence, the ongoing case determines only compensatory and punitive damages.

Wow! Seamy Texas headlines in my wheelhouse of e-discovery and digital evidence! Got to love that! I write (hastily) to explore the applicable Texas rules as well as to parse—admittedly on skeletal information–what seems to have transpired and what it signifies.

To begin, Alex Jones is a 48-year-old, right-wing conspiracy theorist broadcasting rants and raves to millions of listeners who get off on the garbage he spews. Jones and his InfoWars entities were sued for defamation arising from such vile acts as claiming that the murder of 20 six- and seven-year-old children at Sandy Hook Elementary School was a hoax and “false flag.” So, he’s a horrible person, but the legal and factual issues don’t change because a party is a horrible person; horrible people are why we need courts and lawyers.

On August 3, 2022, Jones was on the stand under cross-examination when plaintiffs’ lawyer Mark Bankston asked:

“Mr. Jones, did you know that 12 days ago, your attorneys messed up and sent me an entire digital copy of your entire cellphone with every text message you’ve sent for the past two years?  And when informed, did not take any steps to identify it as privileged or protect it any way, and as of two days ago, it fell free and clear into my possession and that is how I know you lied to me when you said you didn’t have text messages about Sandy Hook.  Did you know that?”

Jones previously testified he’d searched his phone for texts about Sandy Hook and found none.

Consider five elements of the question, because all go to the question of whether privileged communications produced in error may be used by counsel:

  1. “12 days ago”
  2. “your attorneys messed up”
  3. “sent me…every text message you’ve sent for the past two years.”
  4. “when informed, did not take any steps to identify it as privileged or protect it any way”
  5. “as of two days ago, it fell free and clear into my possession….”

Hearing this, any lawyer’s ears will perk up, certainly any e-discovery lawyer’s. Diligent, competent lawyers don’t ‘mess up” by producing every text message irrespective of relevance, responsiveness and privilege! That could prompt a waiver of privilege–every lawyer’s nightmare!

Unless the lawyer practices in the Great State of Texas, a jurisdiction with strong safeguards against unwitting waiver of privilege by inadvertent production. Texas Rule of Civil Procedure 193.3(d) offers a get-out-of-jail-free card that’s easy to play:

Tex. R. Civ. P. Rule 193.3 Asserting a Privilege

d) Privilege not waived by production. A party who produces material or information without intending to waive a claim of privilege does not waive that claim under these rules or the Rules of Evidence if – within ten days or a shorter time ordered by the court, after the producing party actually discovers that such production was made – the producing party amends the response, identifying the material or information produced and stating the privilege asserted. If the producing party thus amends the response to assert a privilege, any party who has obtained the specific material or information must promptly return the specified material or information and any copies pending any ruling by the court denying the privilege.

Aha! So THAT’S what all the twelve days/ten days/two days stuff is about!

Jones’ counsel had ten days from his discovery that privileged information had been unintentionally produced to amend the response to assert a privilege and demand its return. In Texas, we call that “snap-back” and in federal court, it’s what Rule 502 of the Federal Rules of Evidence was intended to fix.

Texas has its own evidence rule on point, Rule 511(b)(2):

Tex. R. Evid. Rule 511: Waiver by Voluntary Disclosure

(b) Lawyer-Client Privilege and Work Product; Limitations on Waiver

(2) Inadvertent Disclosure in State Civil Proceedings. When made in a Texas state proceeding, an inadvertent disclosure does not operate as a waiver if the holder followed the procedures of Rule of Civil Procedure 193.3(d).

Pulling it together:

At some point on or before Friday, July 22, 2022, defense counsel supplemented discovery responses in such a way that two years of Jones’ cell phone messages were produced.  How?  No clue!    Perhaps by placing it into a production “drop box” repository hosted online?  The method of production doesn’t matter, and the form of production is unhelpfully characterized as “digital;” even the time of production isn’t critical.  What matters most is when did defense counsel discover privileged information was produced and what did he do about it?

I expect the July 22 date refers to a communication from plaintiff’s counsel informing defense counsel that privileged or confidential material may have been inadvertently produced. 

I expect it because ABA Model Rule of Professional Conduct 4.4(b) provides:

“A lawyer who receives a document or electronically stored information relating to the representation of the lawyer’s client and knows or reasonably should know that the document or electronically stored information was inadvertently sent shall promptly notify the sender.”

Texas’ disciplinary rules don’t mirror ABA Model Rule 4.4(b), but smart, ethical counsel will supply the notice if for no other reason than failing to do so puts the lawyer receiving the information at risk of unpleasant outcomes, including being booted from the case by disqualification.

Once notified, the snap-back provision kicked in and—tick…tick…tick—defense counsel had ten days to “[amend] the response, identifying the material or information produced and stating the privilege asserted.”  Again, Texas liberally protects against unwitting privilege waiver, so all defense counsel had to do was write something, anything, akin to “the texts we supplied contain privileged attorney-client communications and non-responsive confidential information.  Return them now, destroy any copies and do not use of share any information they hold.”  It wouldn’t have required much specificity since plaintiffs’ counsel knew what he had and believed his opponents counsel “messed up.”  All it would have taken was a peep of timely objection. Yet, as plaintiffs’ counsel put it in court, defense counsel “did not take any steps to identify it as privileged or protect it any way.”

But, literally just as I’ve written the preceding, my dear friend, Mary Mack, the Empress of E-Discovery, sent a text indicating Plaintiffs’ counsel Bankston e-mailed defense counsel Reynal shortly before midnight on Friday, 7/22 and, discussed the production stating: “My assumption is now that you did not intend to send us this? Let me know if I am correct.

The following day, July 23, 2022, defense counsel F. Andino Reynal replied:

Thank you Mark.  There appears to have been a mistake in the file transfer…. Please disregard the link and I will work on resending.  Andino

So, the plot thickens! It’s not clear that defense counsel “did not take any steps to identify it as privileged or protect it any way.” Still, if what I’ve related here is all there was in term of exchanges (and it shouldn’t be), plaintiffs’ counsel is betting heavily that specific assertions of privilege are required for snap back to apply and that citing a “mistake” and asking plaintiffs to “disregard the link” is insufficient to forestall waiver in a state that bends over backwards to protect attorneys from the consequences of inadvertent waiver.

Gutsy or unprofessional? Your call. Was a new link to a collection scrubbed of privileged content ever supplied? Why didn’t defense counsel promptly object at trial and protect the record? Will we next need to discuss the crime/fraud exception to attorney-client privilege? Any way you cut it, this mess promises to be a case study in discovery abuse and lawyer misconduct. Stay tuned!

UPDATE: Here’s a motion filed by Jones on August 4, 2022: