AI Prompt to Improve Keyword Search

Twenty years ago, I dreamed up a website where you would submit a list of eDiscovery keywords and queries and the site would critique the searches and suggest improvements to make them more efficient and effective. It would flag stop words, propose alternate spellings, and alert the user to pitfalls making searches less effective or noisy. I even envisioned it testing queries against a benign dataset to identify overly broad terms and false hits.

I believed this tool would be invaluable for helping lawyers enhance their search skills and achieve greater efficiency. Over the years, I tried to bring this idea to life, seeking proposals from offshore developers and pitching it to e-discovery software publishers as a value-add. In the end, a pipe dream. Even now, nothing like it exists.

The emergence of AI-powered Large Language Models like ChatGPT made me think what I’d hoped to bring to life years ago might finally be feasible. I wondered if I could create a prompt for ChatGPT that would achieve much of what I envisioned. So, I dedicated a sunny Sunday morning to playing “prompt engineer,” a whole cloth term for those who craft AI prompts to achieve desired outcomes.

The result was promising, a significant step forward for lawyers who struggle with search queries without understanding why some fail. Most search errors I encounter aren’t subtle. I’ve written about ways to improve lexical search, and the techniques aren’t rocket science, though they require some familiarity with how electronically stored information is indexed and how search syntaxes differ across platforms. Okay, maybe a little rocket science. But if you’re using a tool for critical tasks, shouldn’t you know what it can and cannot do?

Some believe refining keywords and queries is a waste of time, casting keyword search as obsolete. Perhaps on your planet, Klaatu, but here on Earth, lawyers continue using keywords with reckless abandon. I’m not defending that but neither will I ignore lawyers’ penchant for lexical search. Until the cost, reliability, and replicability of AI-enabled discovery improve, keywords will remain a tool for sifting through large datasets. However, we can use AI LLMs right now to enhance the performance and efficiency of shopworn approaches.

Continue reading

Yes, AI is Here. No, You’re Not Gone.

Yesterday, I sought to defend the value of my law school course on E-Discovery & Digital Evidence to a law Dean who readily conceded that she didn’t know what e-discovery was or why it would be an important thing for lawyers to understand.  It was a bracing experience.

My métier has always been litigation, to the point that everyone I work with sits in and around trial practice.  My close colleagues recognize that 90% of what trial lawyers do is geared to discovery and motion practice, and much of that motion practice is prompted by discovery disputes. So, hearing how a tax lawyer and academic viewed litigation was eye-opening, and troubling to the extent it impacts what’s taught to new lawyers.

Do you agree about the centrality of discovery to litigation, Dear Reader?

The Dean shared her sense that discovery is being replaced by AI and that “soon AI will handle the production of relevant information instead of lawyers.”  I replied that I expected the review phase to be abetted or supplanted by AI in the near term—that’s here—but it would be some time before all the tasks that come before review would be fully AI-enabled.

The idea that there are crucial tasks requiring lawyer intervention before review was surprising to her.  For those who don’t manage electronic discovery day-to-day, electronically stored information seems to magically appear in review tools.  But for e-discovery folks, the march through identification, preservation, collection and processing is our path, and we know that no one, and no AI, can undertake an assessment of the evidence without facing the data.

You’ve got to face the evidence to assess the evidence.

That’s axiomatic; but it’s downplayed by those shouting “AI! AI!”  As they say in these parts, “you’ve got to put the hay down where the goats can get it.”  Until AI is embedded in everything, until AI faces the data in every phone, cloud repository, storage medium and database in ways that support discovery, the goats can’t get to the hay.

The evidence in our cases is not a “collection” until it’s collected.  That doesn’t necessarily mean a copy must be made to isolate data of interest, but that remains the prevailing way that a discrete assemblage of potentially responsive ESI is marshaled before it is processed for search and review.  Not until that occurs does the evidence face human or AI review.

Continue reading

Garden Variety: Byte Fed. v. Lux Vending

Tags

,

My esteemed colleagues, Kelly Twigger and Doug Austin, each posted about a recent discovery decision from the Middle District of Florida, case no. 8:23-cv-102-MSS-SPF, styled, Byte Fed., Inc. v. Lux Vending LLC. and decided by United States Magistrate Judge Sean Flynn on May 1, 2024.

Kelly and Doug share their customarily first-rate analyses of the ruling insofar as its finding that the assertion of boilerplate objections serves as a waiver.  The Court spanked defendant, The Cardamone Consulting Group, LLC, for its conduct.  That’s been picked apart elsewhere, and I have nothing to add.  I write here to address a feature of the dispute that no one has discussed (and sadly, neither did the Court), being the nature of the request for production that prompted the boilerplate objection of “vague and incomprehensible.”  We can learn much more from the case than just boilerplate=waiver.

Let’s look at the underlying request:

DOCUMENT REQUEST NO. 7:

All documents and electronically stored information that are generated in applying the search terms below to Your corporate email accounts (including but not limited to the email accounts for Nicholas Cardamone, Daniel Cardamone, and Patrick McCloskey):

ByteBitcoin w/s FloridaStanton
ByteFederalBitcoin w/s trademarkBranden w/3 Tawil
Byte FederallawsuitBrandon w/3 Mintz
most w/5 trustedScott w/3 BuchananDKI
Google w/s trademarkconfusion or confusedDynamic w/5 keyword

In its Motion to Compel, Plaintiff calls this request “clear on its face, and … a garden-variety type of request for production in connection with narrowly tailored search terms.”  The Plaintiff adds, “[y]et during the parties’ meet-and-confer, and although Cardamone’s counsel claimed that she was familiar with electronic discovery, the assertion was that her client – a company that has purportedly generated hundreds of millions of dollars in connection with online advertising and electronic data – ‘did not understand what to do.’”

So, Dear Reader, would you understand what to do? You’re steeped in electronic discovery—that’s why you’ve stopped by—but is the request clear, narrowly tailored and “garden-variety” such that we can apply it to a proper production workflow?  A few points to ponder:

1. There’s nothing in the Federal Rules of Civil Procedure that prohibits a request to run specific queries against databases, and email accounts are databases.  Rule 34 requires only that the request “describe with reasonable particularity each item or category of items to be inspected.” 

Conventional requests are couched in language geared to relevance; that is, the requests seek documents and ESI about a topic.  Counsel must then apply the law and the facts to guide clients in identifying responsive information.  Counsel reviews the information gathered and decides whether it’s responsive or should be withheld as a matter of right or privilege.

Over time, the notion took hold that sifting through electronically stored information was unduly burdensome, so opposing parties were expected to work together to fashion queries–“search terms” –to narrow the scope of review.  These keyword negotiations run the gamut from laughable to laudable. They’re duels between counsel frequently unarmed with knowledge of the search tools and processes or of the data under scrutiny.  In short, they use their ginormous lawyer brains to guess what might work if the digital world were as they imagine it to be.

Here, the plaintiff cuts to the chase, eschewing a request couched in relevance in favor of asking that specific searches be run: half of them Boolean constructs employing two types of proximity connectors. 

Was this smart?   You decide.

Continue reading

Surviving a Registration Bomb Attack

Tags

, , , ,

It started just after 7:00 last night.  My mailbox swelled with messages confirming I’d subscribed to websites and newsletters around the world.  Within an hour, I’d received over 2,000 such messages, and they kept pouring in until I’d gotten 4,000 registration confirmations by 11:00pm. After that, the flood slowed to a trickle.

I was the victim of a registration bomb attack, a scary experience if you don’t grasp what’s happening or know how to protect yourself.  Fortunately, it wasn’t my first rodeo. 

During a similar attack a couple of years ago, I was like a dog on the Fourth of July–I didn’t know what was happening or how to deal with it.  But this time, my nerves weren’t wracked: I knew what was afoot and where the peril lay.

Cybersecurity is not my principal field of practice, but it’s a forensics-adjacent discipline and one where I try to keep abreast of developments.  So, much like a trial lawyer enjoying the rare chance to serve on a jury, being the target of a cyberattack is as instructive as inconvenient.  

While a registration bomb attack could be the work of a disgruntled reader (Hey! You can’t please everybody), more often they serve to mask attacks on legitimate accounts by burying notices of password resets, funds transfers or fraudulent credit card charges beneath a mountain of messages.  So, yes, you should treat a registration bomb attack as requiring immediate vigilance in terms of your finances.  Keep a weather eye out for small transfers, especially deposits into a bank account as these signal efforts to link your account to another as prelude to theft.  Likewise, look at your credit card transactions to ensure that recent charges are legitimate.  Finally—and the hardest to do amidst a deluge of registration notices—look for efforts to change credentials for e-commerce websites you use like Walmart.com or Amazon.com.

A registration bomb attack is a powerful reminder of the value of always deploying multifactor authentication (MFA) to protect your banking, brokerage and credit card accounts.  Those extra seconds expended on secure logins will spare you hours and days lost to a breach.  With MFA in place, an attacker who succeeds in changing your credentials won’t have the access codes texted to your phone, thwarting efforts to rob you.

The good news is that, if you’re vigilant in the hours a registration bomb is exploding in your email account and you have MFA protecting your accounts, you’re in good shape.

Now for the bad news: a registration bomb is a distributed attack, meaning that it uses a botnet to enlist a legion of unwitting, innocent participants—genuine websites—to do the dirty work of clogging your email account with registration confirmation requests.  Because the websites emailing you are legitimate, there’s nothing about their email to trigger a spam filter until YOU label the message as spam. Unfortunately, that’s what you must do: select the attack messages and label each one as spam.  Don’t bother to unsubscribe to the registrations; just label the messages as spam as quickly as you can. 

This is a pain. And you must be attuned to the potential to mistakenly blacklist senders whose messages you want at the same time you’re squashing the spam messages you don’t want and scanning for password change notices from your banks, brokers and e-commerce vendors.  It’s easier when you know how to select multiple messages before hitting the “spam” button (in Gmail, holding down the Shift key enables you to select a range of messages by selecting the first and last message in the range).  Happily, the onslaught of registration spam will stop; thousands become hundreds and hundreds become dozens in just hours (though you’ll likely get stragglers for days).

Registration bombing attacks will continue so long as the web is built around websites sending registration confirmation messages—a process ironically designed to protect you from spam.   If you’ve deployed the essential mechanisms to protect yourself online, particularly strong, unique passwords, multifactor authentication and diligent review of accounts for fraudulent transactions, don’t panic; the registration bomb will be no more than a short-lived inconvenience.  This, too, shall pass.

ChatGPT Proves a Mediocre Law Student

I recently spent a morning testing ChatGPT’s abilities by giving it exercises and quizzes designed for my law and computer science graduate students. Overall, I was impressed with its performance, but also noticed that it’s frequently wrong but never in doubt: a mechanical mansplainer!

If you’re asking, “What is ChatGPT,” I’ll let it explain itself: “ChatGPT is a large language model developed by OpenAI. It is a type of machine learning model called a transformer, which is trained to generate text based on a given prompt. It is particularly well-suited to tasks such as natural language processing, text generation, and language translation. It is capable of understanding human language and generating human-like text, which makes it useful for a wide range of applications, such as chatbots, question-answering systems, and text completion.” I’d add that it’s the first artificial intelligence tool that anyone can play with, and it’s generating enormous excitement.

In terms of its legal knowledge, ChatGPT performed like a mediocre law student, performing well on essay questions and drafting exercises, but struggling with multiple choice questions. I couldn’t assign it a definitive score on the 25-question quiz as some questions required image analysis and I didn’t see a way to include illustrations in the chat interface. However, it performed poorly enough on the quiz for me to conclude that a passing score of 70 out of 100 points was out of reach.

One example of a question it struggled with was:

Which of the following is NOT an essential metadata value (a “metadata must”) to preserve and produce for a file?

  1. Custodian
  2. File Name
  3. Last Modified Date and Time
  4. Last Accessed Date and Time

The correct answer is Last Accessed Date and Time.  Class readings underscore the unreliability of Last Accessed dates and times and I supply a table of “Metadata Musts” for preservation and production.  Presumably trained on other sources, ChatGPT did not seem to understand the importance of a proper chain of custody for electronic evidence and replied:

Custodian is not considered an essential metadata value to preserve and produce for a file. The other options (File Name, Last Modified Date and Time, Last Accessed Date and Time) are all considered essential metadata values that should be preserved and produced for a file in order to provide important information about the file and its history.

Since it wasn’t performing well on my quiz, I asked,

Can you suggest some multiple choice quiz questions for students studying the fundamentals of electronic discovery and electronic data storage?

The first suggestion was:

What is the process of collecting, processing, and producing electronically stored information (ESI) in response to a legal request called?

a. Electronic discovery

b. Digital forensics

c. E-discovery

d. Data retrieval

In ChatGPT’s formulation “Electronic discovery” is the right response and “E-discovery” is wrong! 

Another suggestion:

What is the most common format for ESI production?

a. PDF

b. TIFF

c. Native file format

d. ASCII

That’s a challenging question because, as much as I’ve focused on and written about forms of production, I’ve not come across metrics establishing the most common format.  In supplying an answer key, ChatGPT identified Native file format as the correct answer, though I would have guessed “TIFF.”  I certainly wouldn’t have included ASCII as a choice.

Here are a few other quiz questions suggested by ChatGPT.  I’ve bolded what ChatGPT defined as the correct response and, in each instance, I’d conclude it’s either not the best answer or the question is so poorly constructed as to defy characterizing any single answer as the right one:

What is the process of identifying and eliminating irrelevant or privileged ESI called?

a. Culling

b. Filtering

c. Redaction

d. Processing

What is the process of reviewing ESI to identify relevant information called?

a. Review

b. Analysis

c. Coding

d. Searching

What is the process of converting ESI from its original format to a reviewable format called?

a. Conversion

b. Processing

c. Extraction

d. Production

What is the process of creating a backup copy of ESI for preservation purposes called?

a. Backup

b. Imaging

c. Archiving

d. Duplication

What is the process of destroying or disposing of ESI after it is no longer needed called?

a. Deletion

b. Erasure

c. Destruction

d. Purging

Encouraged that the law faculty might be safe for another semester, I turned ChatGPT to a drafting exercise:

You are to develop the best checklist of potential data sources that are candidates for legal preservation in any engagement—good enough that you’ll want to keep it to use in your own work. It does NOT have to be entirely original, and I encourage you to start with the best existing checklist(s) you can find and then make updates and improvements. Certainly, you should seek to add missing sources, but you may also see a better way to organize and present the content. Remember that this checklist is a list of sources and varieties not methods and, crucially, it should serve to enhance the completeness of the personal data map you will create (though it should NOT be limited to the scope of your personal digital footprint alone). Thus, generic checklist items like “Online sources” or “Local storage” won’t be sufficiently granular. Be sure to credit your sources and format it as a checklist.

ChatGPT responded:

Data Preservation Checklist

  1. Social media accounts (Facebook, Twitter, Instagram, LinkedIn, etc.)
  2. Email accounts (Gmail, Outlook, Yahoo, etc.)
  3. Cloud storage accounts (Google Drive, Dropbox, OneDrive, etc.)
  4. Mobile device data (text messages, call logs, contacts, etc.)
  5. Computer files (documents, photos, videos, etc.)
  6. Internet browsing history and cache
  7. Online marketplace accounts (eBay, Amazon, Etsy, etc.)
  8. Online banking and financial accounts (Paypal, bank accounts, credit cards, etc.)
  9. GPS location data
  10. Smart home device data (Amazon Echo, Google Home, etc.)
  11. Wearable device data (Fitbit, Apple Watch, etc.)
  12. Gaming accounts (Xbox Live, PlayStation Network, Steam, etc.)
  13. Voicemail and answering machine messages
  14. Instant messaging and chat logs (WhatsApp, Messenger, etc.)
  15. Social media direct messages and private messages
  16. Online dating profiles and messages
  17. Online forum and message board posts
  18. Online video and audio recordings (YouTube, Vimeo, etc.)
  19. Online photo and image sharing accounts (Flickr, Imgur, etc.)
  20. Online document collaboration and sharing accounts (Google Docs, Microsoft Office 365, etc.)

Source: Adapted from “Digital Preservation Checklist” by the National Archives and Records Administration (NARA) (https://www.archives.gov/digital-preservation/checklist) and “Digital Preservation Checklist for Individuals” by the Digital Preservation Coalition (https://www.dpconline.org/advice/digital-preservation-checklist-for-individuals)

Not bad, and certainly typical of what I get from students.  Interestingly, both citations are dead links, so unacceptable from my students, and (arguably) ChatGPT simply ignored the requirement to format its answer as a checklist. 

I had ChatGPT answer the question a second time sans the mandate that it credit sources. I was struck by the differences in the lists supplied and the disclaimer now appended:

  1. Social media accounts (e.g. Facebook, Twitter, Instagram, LinkedIn)
  2. Email accounts (e.g. Gmail, Yahoo, Outlook)
  3. Cloud storage accounts (e.g. Google Drive, Dropbox, iCloud)
  4. Online shopping accounts (e.g. Amazon, eBay, Etsy)
  5. Financial accounts (e.g. bank accounts, credit card accounts, PayPal)
  6. Mobile phone data (e.g. call logs, text messages, contacts)
  7. Computer hard drives
  8. USB drives and external hard drives
  9. Social media messaging apps (e.g. WhatsApp, iMessage, Messenger)
  10. Online forums and discussion boards
  11. Online dating profiles
  12. Online gaming accounts
  13. Web browsing history and cookies
  14. Internet service provider records
  15. GPS data (e.g. location data from mobile devices)
  16. Smart home device data (e.g. Amazon Echo, Google Home)
  17. Vehicle tracking data (e.g. OnStar, Tesla)
  18. Medical records (e.g. electronic health records)
  19. Educational records (e.g. transcripts, grades)
  20. Employee records (e.g. HR files, payroll records)

Note: This checklist is not exhaustive and may vary depending on the specific engagement or case. It is important to consult with legal and technical experts to determine the most relevant and appropriate data sources for preservation. Additionally, it is important to consider any privacy laws and regulations when collecting and preserving personal data.

I asked ChatGPT to identify “the most important legal opinions I should assign my students to read that will help them better understand electronic discovery in the United States” and it responded:

  1. Zubulake v. UBS Warburg LLC, 217 F.R.D. 309 (S.D.N.Y. 2003) – This case established the “Zubulake standards,” which are widely used as a guide for electronic discovery in the United States. It lays out the obligations of parties to preserve and produce electronically stored information (ESI) and the sanctions that can be imposed for failure to comply.
  2. Victor Stanley, Inc. v. Creative Pipe, Inc., 250 F.R.D. 251 (D. Md. 2008) – This case discusses the duty to preserve ESI and the consequences of spoliation (destruction or alteration of evidence). It also addresses the standard for imposing sanctions for spoliation.
  3. The Sedona Conference® Commentary on Achieving Quality in the E-Discovery Process, The Sedona Conference (2008) – This commentary provides guidance on best practices for e-discovery, including the preservation and production of ESI, the use of technology-assisted review (TAR) and predictive coding, and the handling of privileged information.
  4. Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182 (S.D.N.Y. 2012) – This case addresses the use of TAR and predictive coding in e-discovery, and emphasizes the importance of cooperation and communication between parties when using these technologies.
  5. Morgan Stanley Smith Barney LLC v. Skowron, No. 11-CV-8335 (S.D.N.Y. Mar. 29, 2013) – This case discusses the use of proportionality in e-discovery, and the court’s authority to limit the scope of e-discovery in order to reduce the burden and expense of producing ESI.
  6. In re: Actos (Pioglitazone) Products Liability Litigation, MDL No. 6:11-md-2299 (W.D. La. 2013) – This case addresses the use of advanced analytics and data visualization tools in e-discovery, and highlights the importance of considering the costs and benefits of using these technologies in the e-discovery process.

Except for mischaracterizing a Sedona Conference Commentary as a “legal opinion,” they’re all fine suggestions, yet all are ten-twenty years old. 

I asked ChatGPT a question posed by a student in my last class: “What is the difference between electronic discovery and computer forensics?”  It opined:

Electronic discovery (e-discovery) and computer forensics are both related to the collection and analysis of electronic data, but they serve different purposes.

Electronic discovery is the process of collecting, processing, and producing electronic data in response to a legal request or investigation. It is typically used in civil litigation and is designed to identify and produce relevant electronic evidence that may be used in court. The goal of e-discovery is to find information that is responsive to the legal request and that is not privileged or protected by any other legal exceptions.

Computer forensics, on the other hand, is the process of collecting, preserving, analyzing, and presenting electronic evidence in a manner that is admissible in court. It is typically used in criminal investigations and is designed to identify, recover, and preserve evidence of a crime or other misconduct. The goal of computer forensics is to find and preserve evidence that may be used to prove or disprove a crime or other wrongdoing.

In summary, electronic discovery is focused on finding and producing relevant electronic evidence in response to a legal request, while computer forensics is focused on finding and preserving evidence of a crime or other misconduct. Both processes require the use of specialized tools and techniques to collect and analyze electronic data, but they serve different purposes and are used in different types of legal proceedings.

A good answer.  My response in class focused on the relative accessibility and intelligibility of the ESI we deal with in e-discovery versus digital forensics, and I didn’t tie forensics to criminal investigations because so much of my work in the field has concentrated on civil cases.  But I can’t say I did any better than the AI.  I smiled more.

Finally, I asked who are the most important thought leaders in the field of electronic discovery?  ChatGPT replied:

  1. Ralph Losey: A well-known attorney and author who has written extensively on the topic of electronic discovery and has been recognized as a leading expert in the field.
  2. Craig Ball: A forensic expert and attorney who has played a key role in shaping the laws and practices related to electronic discovery through his work as a consultant, expert witness, and author.
  3. Tom O’Connor: A consultant, speaker, and author who is considered one of the leading experts on the use of technology in the legal field and has been instrumental in developing best practices for electronic discovery.
  4. George Socha: A consultant, speaker, and author who is considered a leading expert in the field of electronic discovery and has been instrumental in the development of industry standards and best practices.
  5. Maura Grossman: A consultant, speaker, and author who is considered one of the leading experts on the use of artificial intelligence and machine learning in electronic discovery and has been instrumental in advancing the field.
  6. E-Discovery Team: A team of experts who have written extensively on the topic of electronic discovery and have been recognized as thought leaders in the field through their work as consultants, speakers, and authors.
  7. Paul J. Lippe: A consultant, speaker, and author who is considered a leading expert in the field of electronic discovery and has been instrumental in the development of industry standards and best practices.
  8. John Simek: A consultant, speaker, and author who is considered one of the leading experts on the use of technology in the legal field and has been instrumental in developing best practices for electronic discovery.
  9. David Horrigan: A consultant, speaker, and author who is considered one of the leading experts on the use of technology in the legal field and has been instrumental in developing best practices for electronic discovery.
  10. Daniel Garrie: A consultant, speaker, and author who is considered a leading expert in the field of electronic discovery and has been instrumental in the development of industry standards and best practices.

Kudos to Ralph Losey for topping the list and garnering a second mention as “E-Discovery Team.”  I wouldn’t have considered Paul J. Lippe a candidate for an e-discovery-specific list, and I can’t very well commend ChatGPT for wokeness.  One woman?  No persons of color?  No jurists?  I’m flattered to be included among the other old, white men, but know well that there are plenty as or more deserving.

Ten Tips for Better ESI Expert Reports

A lawyer I admire asked me to talk to her colleague about expert reports.  I haven’t had that conversation yet, but the request got me thinking about the elements of a competent expert report, especially reports in my areas of computer forensics and digital evidence.  I dashed off ten things I thought contribute to the quality of the best expert reports.  If these were rules, I’d have to concede I’ve learned their value by breaking a few of them.  I’ve left out basic writing tips like “use conversational language and simple declarative sentences.” There are lists of rules for good writing elsewhere and you should seek them out.  Instead, here’s my impromptu list of ten tips for crafting better expert reports on technical issues in electronic discovery and computer forensics:

  1. Answer the questions you were engaged to resolve.
  2. Don’t overreach your expertise.
  3. Define jargon, and share supporting data in useful, accessible ways.
  4. Distinguish factual findings from opinions.
  5. Include language addressing the applicable evidentiary standard.
  6. Eschew advocacy; let your expertise advocate for you.
  7. Challenge yourself and be fair.
  8. Proofread.  Edit.  Proofread again. Sleep on it. Edit again.
  9. Avoid assuming the fact finder’s role in terms of ultimate issues.
  10. Listen to your inner voice.

Most of these are self-explanatory but please permit me a few clarifying comments.

Answer the questions you were engaged to resolve.

My pet peeve with expert reports is that they don’t always address the questions important to the court and counsel.  I’ve seen reports spew hundreds of pages of tables and screenshots without conveying what any of it means to the issues in the case.  Sometimes you can’t answer the questions.  Fine.  Say so.  Other times you must break down or reframe the questions to conform to the evidence.  That’s okay, too, IF it’s not an abdication of the task you were brought in to accomplish.  But, the best, most useful and intelligible expert reports pose and answer specific questions.

Don’t overreach your expertise.

The standard to qualify as an expert witness is undemanding: do you possess specialized knowledge that would assist the trier of fact in understanding the evidence or resolving issues of fact? See, e.g., Federal Rule of Evidence 702.  With the bar so low, it can be tempting to overreach your expertise, particularly when pushed by a client to opine on something you aren’t fully qualified to address.  For example, I’m a certified computer forensic examiner and I studied accounting in college, but I’m not a forensic accountant.  I know a lot about digital forgery, but I’m not a trained questioned document examiner.  These are specialties.  I try to stay in my own lane and commend it to other experts.

Define jargon, and share supporting data in useful, accessible ways.

Can someone with an eighth-grade education and no technical expertise beyond that of the average computer user understand your report?  If not, you’re writing for the wrong audience.  We should write to express, not impress.  I love two-dollar words and the bon mot phrase, but they don’t serve me well when writing reports.  Never assume that a technical term will be universally understood.  If your grandparents wouldn’t know what it means, define it.

Computer forensic tools are prone to generate lengthy “reports” rife with incomprehensible data.  It’s tempting to tack them on as appendices to add heft and underscore how smart one must be to understand it all.  But it’s the expert’s responsibility to act as a guide to the data and ensure its import is clear.  I rarely testify—even by affidavit–without developing annotated demonstrative examples of the supporting data.  Don’t wait for the deposition or hearing to use demonstrative evidence; make points clear in the report.

Too, I’m fond of executive summaries; that is, an up-front, cut-to-the-chase paragraph relating the upshot of the report.

Distinguish factual findings from opinions.

The key distinction between expert and fact witnesses is that expert witnesses are permitted to express opinions that go beyond their personal observation.  A lay witness to a crash may testify to speeds based only upon what they saw with their own eyes.  An accident reconstructionist can express an opinion of how fast the cars were going based upon evidence that customarily informs expert opinions like skid marks and vehicle deformation.  Each type of testimony must satisfy different standards of proof in court; so, to make a clear and defensible record, it’s good practice to distinguish factual findings (“things you saw”) from opinions (“things you’ve concluded based upon what you saw AND your specialized knowledge, training and experience”).  This  naturally begets the next tip:

Include language addressing the applicable evidentiary standard.

Modern jurisprudence deploys safeguards like the Daubert standard to combat so-called “junk science.”  Technical expert opinions must be based upon a sound scientific methodology, viz., sufficient facts or data and the product of reliable principles and methods.  While a court acting as gatekeeper can infer the necessary underpinnings from an expert’s report and C.V., expressly stating that opinions are based upon proper and accepted standards makes for a better record.

Eschew advocacy; let your expertise advocate for you.

Mea culpa here.  Because I was a trial lawyer for three+ decades, I labor to restrain myself in my reporting to ensure that I’m not intruding into the lawyer’s realm of advocacy.  I don’t always succeed.  Even if you’re working for a side, be as scrupulously neutral as possible in your reporting.  Strive to act and sound like you don’t care who prevails even if you’re rooting for the home team.  If you do your job well, the facts will advocate the right outcome.

Challenge yourself and be fair.

My worst nightmare as an expert witness is that I will mistakenly opine that someone committed a bad act when they didn’t.  So, I’m always trying to punch holes in my own theories and asking myself, “how would I approach this if I were working for the other side?”  Nowhere is this more important than when working as a court-appointed neutral expert.  Even if you’d enjoying seeing a terrible person fry, be fair.  You stand in the shoes of the Court.

Proofread.  Edit.  Proofread again. Sleep on it. Edit again.

Who has that kind of time, right?  Still, try to find the time.  Few things undermine the credibility of an expert report like a bunch of spelling and grammatical errors.  Stress and fatigue make for poor first drafts.  It often takes a good night’s sleep (or at least a few hours away from the work) to catch the inartful phrase, typo or other careless error.

Avoid assuming the fact finder’s role in terms of ultimate issues.

Serving as a court Special Master a few years back, I opined that the evidence of a certain act was so overwhelming that the Court should only reach one result.  Accordingly, I ceased investigating the loss of certain data that I regarded as out-of-scope.  I was right…but I was also wrong.  The Court has a job to do and, by my eliding over an issue the Court was obliged to address, the Court had to rule without benefit of what a further inquiry into the missing evidence would have revealed. The outcome was the same, but by assuming the factfinder’s role on an ultimate issue, I made the Court’s job harder.  Don’t do that.

Listen to your inner voice.

In expressing expert opinions, too much certainty—a/k/a arrogance–is as perilous as too much doubt.  Perfect is not the standard, but you should be reasonably confident of  your opinion based on a careful and competent review of the evidence.  If something “feels” off, it may be your inner voice telling you to look again. 

Final Exam Review: How Would You Fare?

It’s nearly finals time for the students in my E-Discovery and Digital Evidence course at the University of Texas School of Law. I just completed the Final Exam Study Guide for the class and thought readers who wonder what a tech-centric law school e-discovery curriculum looks like might enjoy seeing what’s asked of the students in a demanding 3-credit law school course. Whether you’re ACEDS certified, head of your e-discovery practice group or just an e-discovery groupie like me, consider how you’d fare preparing for an exam with this scope and depth. I’m proud of my bright students. You’d be really lucky to hire one of my stars.

E-Discovery – Spring 2021 Final Exam Study Guide

The final exam will cover all readings, lectures, exercises and discussions on the syllabus.
(Syllabus ver. 21.0224 in conjunction with Workbook ver. 21.0214 and Announcements).

  1. We spent a month on meeting the preservation duty and proportionality.  You undertook a two-part legal hold drafting exercise.  Be prepared to bring skills acquired from that effort to bear on a hypothetical scenario.  Be prepared to demonstrate your understanding of the requisites of fashioning a defensible legal hold and sensibly targeting a preservation demand to an opponent.  As well, your data mapping skills should prove helpful in addressing the varied sources of potentially relevant ESI that exist, starting at the enterprise level with The Big Six (e-mail, network shares, mobile devices, local storage, social networking and databases).  Of course, we must also consider Cloud repositories and scanned paper documents as potential sources.
  2. An essential capability of an e-discovery lawyer is to assess a case for potentially relevant ESI, fashion and implement a plan to identify accessible and inaccessible sources, determine their fragility and persistence, scope and deploy a litigation hold and take other appropriate first steps to counsel clients and be prepared to propound and respond to e-discovery, especially those steps needed to make effective use of the FRCP Rule 26(f) meet-and-confer process.  Often, you must act without having all the facts you’d like and rely upon your general understanding of ESI and information systems to put forward a plan to acquire the facts and do so with sensitivity to the cost and disruption your actions may engender.  Everything we’ve studied was geared to instilling those capabilities in you.
  3. CASES: You are responsible for all cases covered during the semester.  When you read each case, you should ask yourself, “What proposition might I cite this case to support in the context of e-discovery?”  That’s likely to be the way I will have you distinguish the cases and use them in the exam.  I refer to cases by their style (plaintiff versus defendant), so you should be prepared to employ a mnemonic to remember their most salient principles of each, e.g., Columbia Pictures is the ephemeral data/RAM case; Rambus is the Shred Day case; In re NTL is the right of control case; In re: Weekley Homes is the Texas case about accessing the other side’s hard drives, Wms v. Sprint is the spreadsheet metadata case (you get the idea).  I won’t test your memory of jurists, but it’s helpful-not-crucial to recall the authors of the decisions (especially when they spoke to our class like Judges Peck and Grimm). 

Case Review Hints:

  • Green v. Blitz: (Judge Ward, Texas) This case speaks to the need for competence in those responsible for preservation and collection and what constitutes a defensible eDiscovery strategy. What went wrong here? What should have been done differently?
  • In re: Weekly Homes: (Texas Supreme Court) This is one of the three most important Texas cases on ESI. You should understand the elements of proof which the Court imposes for access to an opponent’s storage devices and know terms of TRCP Rule 196.4, especially the key areas where the state and Federal ESI rules diverge.
  • Zubulake: (Judge Scheindlin, New York) The Zubulake series of decisions are seminal to the study of e-discovery in the U.S.  Zubulake remains the most cited of all EDD cases, so is still a potent weapon even after the Rules amendments codified much of its lessons. Know what the case is about, how the plaintiff persuaded the court that documents were missing and what the defendant did or didn’t do in failing to meet its discovery obligations. Know what an adverse inference instruction is and how it was applied in Zubulake versus what must be established under FRCP Rule 37€ after 2015. Know what Judge Scheindlin found to be a litigant’s and counsel’s duties with respect to preservation. Seven-point analytical frameworks (as for cost-shifting) make good test fodder.
  • Williams v. Sprint: (Judge Waxse, Kansas). Williams is a seminal decision respecting metadata. In Williams v. Sprint, the matter concerned purging of metadata and the locking of cells in spreadsheets in the context of an age discrimination action after a reduction-in-force. Judge Waxse applied Sedona Principle 12 in its earliest (and now twice revised) form. What should Sprint have done?  Did the Court sanction any party? Why or why not?
  • Rodman v. Safeway: (Judge Tigar, ND California) This case, like Zubulake IV, looks at the duties and responsibilities of counsel when monitoring a client’s search for and production of potentially responsive ESI? What is Rule 26(g), and what does it require? What constitutes a reasonable search? To what extent and under what circumstances may counsel rely upon a client’s actions and representations in preserving or collecting responsive ESI?
  • Columbia Pictures v. Bunnell: (Judge Chooljian, California) What prompted the Court to require the preservation of such fleeting, ephemeral information? Why were the defendants deemed to have control of the ephemeral data? Unique to its facts?
  • In re NTL, Inc. Securities Litigation: (Judge Peck, New York) Be prepared to discuss what constitutes control for purposes of imposing a duty to preserve and produce ESI in discovery and how it played out in this case. I want you to appreciate that, while a party may not be obliged to succeed in compelling the preservation or production of relevant information beyond its care, custody or control, a party is obliged to exercise all such control as the party actually possesses, whether as a matter of right or by course of dealing. What’s does The Sedona Conference think about that?
  • William A. Gross Constr. Assocs., Inc. v. Am. Mfrs. Mut. Ins. Co.: (Judge Peck, New York) What was the “wake up call,” who were expected to awaken and on what topics?
  • Adams v. Dell: (Judge Nuffer, Utah) What data was claimed to have been lost? What was supposed to have triggered the duty to preserve? What did the Court say about a responding party’s duty, particularly in designing its information systems? Outlier?
  • RAMBUS: (Judge Whyte, California) I expect you to know what happened and to appreciate that the mere reasonable anticipation of litigation–especially by the party who brings the action–triggers the common law duty to preserve. Be prepared to address the sorts of situations that might or might not trigger a duty to initiate a legal hold.
  • United States v. O’Keefe (Judge Facciola, DC): I like this case for its artful language (Where do angels fear to tread?) and consideration of the limits and challenges of keyword search.  The last being a topic that bears scrutiny wherever it has been addressed in the material.  That is, does keyword search work as well as lawyers think, and how can we improve upon it and compensate for its shortcomings? 
  • Victor Stanley v. Creative Pipe I & II (Judge Grimm, Maryland):  Read VS I with an eye toward understanding the circumstances when inadvertent production triggers waiver (pre-FRE 502).  What are the three standards applied to claims of waiver?  What needs to be in the record to secure relief?

    Don’t get caught up in the prolonged factual minutiae of VS II.  Read VS II to appreciate the varying standards that once existed across the Circuits for imposition of spoliation sanctions and that pre-date the latest FRCP Rules amendments, i.e., Rule 37(e).
  • Anderson Living Trust v. WPX Energy Production, LLC (Judge Browning, New Mexico): This case looks at the application and intricacies of FRCP Rule 34 when it comes to ESI versus documents.  My views about the case were set out in the article you read called “Breaking Badly.”
  • In re: State Farm Lloyds (Texas Supreme Court):  Proportionality is the buzzword here; but does the Court elevate proportionality to the point of being a costly hurdle serving to complicate a simple issue?  What does this case portend for Texas litigants in terms of new hoops to jump over issues as straightforward as forms of production?  What role did the Court’s confusion about forms (and a scanty record) play in the outcome?
  • Monique Da Silva Moore, et al. v. Publicis Groupe & MSL Group and Rio Tinto Plc v. Vale S.A., (Judge Peck, New York): DaSilva Moore is the first federal decision to approve the use of the form of Technology Assisted Review (TAR) called Predictive Coding as an alternative to linear, manual review of potentially responsive ESI.  Rio Tinto is Judge Peck’s follow up, re-affirming the viability of the technology without establishing an “approved” methodology.
  • Brookshire Bros. v. Aldridge (Texas Supreme Court): This case sets out the Texas law respecting spoliation of ESI…or does it?  Is the outcome and “analysis” here consistent with the other preservation and sanctions cases we’ve covered?
  • VanZant v. Pyle (Judge Sweet, New York): Issues of control and spoliation drive this decision.  Does the Court correctly apply Rule 37(e)?
  • CAT3 v. Black Lineage (Judge Francis, New York):This trademark infringement dispute concerned an apparently altered email.  Judge Francis found the alteration sufficient to support sanctions under Rule 37(e).  How did he get there?  Judge Francis also addressed the continuing viability of discretionary sanctions despite 37(e).  What did he say about that?
  • EPAC v. Thos. Nelson, Inc.: Read this report closely to appreciate how the amended Rules, case law and good practice serve to guide the court in fashioning remedial measures and punitive sanctions.  Consider the matter from the standpoint of the preservation obligation (triggers and measures) and from the standpoint of proportionate remedial measures and sanctions.  What did the Special Master do wrong here?
  • Mancia v. Mayflower (Judge Grimm, Maryland): Don’t overlook this little gem in terms of its emphasis on counsel’s duties under FRCP Rule 26(g).  What are those duties?  What do they signify for e-discovery? What is the role of cooperation in an adversarial system?
  • Race Tires America, Inc. v. Hoosier Racing Tire Corp. (Judge Vanaskie, Pennsylvania): This opinion cogently defines the language and limits of 28 U.S.C. §1920 as it relates to the assessment of e-discovery expenses as “taxable costs.”  What common e-discovery expenses might you seek to characterize as costs recoverable under §1920, and how would you make your case?
  • Zoch v. Daimler (Judge Mazzant, Texas):  Did the Court correctly resolve the cross-border and blocking statute issues?  Would the Court’s analysis withstand appellate scrutiny once post-GDPR? 

Remember: bits, bytes, sectors, clusters (allocated and unallocated), tracks, slack space, file systems and file tables, why deleted doesn’t mean gone, forensic imaging, forensic recovery techniques like file carving, EXIF data, geolocation, file headers/binary signatures, hashing, normalization, de-NISTing, deduplication and file shares.  For example: you should know that an old 3.5” floppy disk typically held no more than 1.44MB of data, whereas the capacity of a new hard drive or modern backup tape would be measured in terabytes. You should also know the relative capacities indicated by kilobytes, megabytes, gigabytes, terabytes and petabytes of data (i.e., their order of ascendancy, and the fact that each is 1,000 times more or less than the next or previous tier).  Naturally, I don’t expect you to know the tape chronology/capacities, ASCII/hex equivalencies or other ridiculous-to-remember stuff.

4. TERMINOLOGY: Lawyers, more than most, should appreciate the power of precise language.  When dealing with professionals in technical disciplines, it’s important to call things by their right name and recognize that terms of art in one context don’t necessarily mean the same thing in another.  When terms have been defined in the readings or lectures, I expect you to know what those terms mean.  For example, you should know what ESI, EDRM, RAID, system and application metadata (definitely get your arms firmly around application vs. system metadata), retention, purge and rotation mean (e.g., grandfather-father-son rotation); as well as Exchange, O365, 26(f), 502(d), normalization, recursion, native, near-native, TIFF+, load file, horizontal, global and vertical deduplication, IP addressing, data biopsy, forensically sound, productivity files, binary signatures and file carving, double deletion, load files, delimiters, slack space, unallocated clusters, UTC offset, proportionality, taxable costs, sampling, testing, iteration, TAR, predictive coding, recall, precision, UTC, VTL, SQL, etc.

5. ELECTRONIC DISCOVERY REFERENCE MODEL:  We’ve returned to the EDRM many times as we’ve moved from left to right across the iconic schematic.  Know it’s stages, their order and what those stages and triangles signify.

6. ENCODING: You should have a firm grasp of the concept of encoded information, appreciating that all digital data is stored as numbers notated as an unbroken sequence of 1s and 0s. How is that miracle possible? You should be comfortable with the concepts described in pp. 132-148 of the Workbook (and our class discussions of the fact that the various bases are just ways to express numbers of identical values in different notations). You should be old friends with the nature and purpose of, e.g., base 2 (binary), base 10 (decimal) base 16 (hexadecimal), base 64 (attachment encoding), ASCII and UNICODE.

7 STORAGE: You should have a working knowledge of the principal types and capacities of common electromagnetic and solid-state storage devices and media (because data volume has a direct relationship to cost of processing and time to review in e-discovery). You should be able to recognize and differentiate between, e.g., floppy disks, thumb drives, optical media, hard drives, solid state storage devices, RAID arrays and backup tape, including a general awareness of how much data they hold. Much of this is in pp. 22-48 of the Workbook (Introduction to Data Storage Media).  For ready reference and review, I’ve added an appendix to this study guide called, “Twenty-One Key Concepts for Electronically Stored Information.”

8. E-MAIL: E-mail remains the epicenter of corporate e-discovery; so, understanding e-mail systems, forms and the underlying structure of a message is important.  The e-mail chapter should be reviewed carefully.  I wouldn’t expect you to know file paths to messages or e-mail forensics, but the anatomy of an e-mail is something we’ve covered in detail through readings and exercises.  Likewise, the messaging protocols (POP, MAPI, IMAP, WEB, MIME, etc.), mail single message and container formats (PST, OST, EDB, NSF, EML, MSG, DBX, MHTML, MBOX) and leading enterprise mail client-server pairings (Exchange/Outlook, Domino/Notes, O365/browser) are worth remembering.  Don’t worry, you won’t be expected to extract epoch times from boundaries again. 😉

9. FORMS: Forms of production loom large in our curriculum.  Being that everything boils down to just an unbroken string of ones-and-zeroes, the native forms and the forms in which we elect to request and produce them (native, near-native, images (TIFF+ and PDF), paper) play a crucial role in all the “itys” of e-discovery: affordability, utility, intelligibility, searchability and authenticability.  What are the purposes and common structures of load files?  What are the pros and cons of the various forms of production?  Does one size fit all?  How does the selection of forms play out procedurally in federal and Texas state practice?  How do we deal with Bates numbering and redaction?  Is native and near-native production better and, if so, how do we argue the merits of native production to someone wedded to TIFF images?  This is HUGE in my book!  There WILL be at least one essay question on this and likely several other test questions.

10. SEARCH AND REVIEW: We spent a fair amount of time talking about and doing exercises on search and review.  You should understand the various established and emerging approaches to search: e.g., keyword search, Boolean search, fuzzy search, stemming, clustering, predictive coding and Technology Assisted Review (TAR).  Why is an iterative approach to search useful, and what difference does it make?  What are the roles of testing, sampling and cooperation in fashioning search protocols?  How do we measure the efficacy of search?  Hint: You should know how to calculate recall and precision and know the ‘splendid steps’ to take to improve the effectiveness and efficiency of keyword search (i.e., better F1 scores). 

You should know what a review tool does and customary features of a review platform.  You should know the high points of the Blair and Maron study (you read and heard about it multiple times, so you need not read the study itself).  Please also take care to understand the limitations on search highlighted in your readings and those termed The Streetlight Effect.

11.ACCESSIBILITY AND GOOD CAUSE: Understand the two-tiered analysis required by FRCP Rule 26(b)(2)(B).  When does the burden of proof shift, and what shifts it?  What tools (a/k/a conditions) are available to the Court to protect competing interests of the parties

12. FRE RULE 502: It’s your friend!  Learn it, love it, live it (or at least know when and how to use it).  What protection does it afford against subject matter waiver?  Is there anything like it in state practice?  Does it apply to all legally cognized privileges?

13. 2006 AND 2015 RULES AMENDMENTS: You should understand what they changed with respect to e-discovery.  Concentrate on proportionality and scope of discovery under Rule 26, along with standards for sanctions under new Rule 37(e).  What are the Rule 26 proportionality factors?  What are the findings required to obtain remedial action versus serious sanctions for spoliation of ESI under 37(e)?  Remember “intent to deprive.”

14. MULTIPLE CHOICE: When I craft multiple choice questions, there will typically be two answers you can quickly discard, then two you can’t distinguish without knowing the material. So, if you don’t know an answer, you increase your odds of doing well by eliminating the clunkers and guessing. I don’t deduct for wrong answers.  Read carefully to not whether the question seeks the exception or the rule. READ ALL ANSWERS before selecting the best one(s) as I often include an “all of the above” or “none of the above” option.

15. All lectures and reviews of exercises are recorded and online for your review, if desired.

16. In past exams, I used the following essay questions.  These will not be essay questions on your final exam; however, I furnish them here as examples of the scope and nature of prior essay questions:

EXAMPLE QUESTION A: On behalf of a class of homeowners, you sue a large bank for alleged misconduct in connection with mortgage lending and foreclosures. You and the bank’s counsel agree upon a set of twenty Boolean and proximity queries including:

  • fnma AND deed-in-lieu
  • 1/1/2009 W/4 foreclos!
  • Resumé AND loan officer
  • LTV AND NOT ARM
  • (Problem W/2 years) AND HARP

These are to be run against an index of ten loan officers’ e-mail (with attached spreadsheets, scanned loan applications, faxed appraisals and common productivity files) comprising approximately 540,000 messages and attachments).  Considering the index search problems discussed in class and in your reading called “The Streetlight Effect in E-Discovery,” identify at least three capabilities or limitations of the index and search tool that should be determined to gauge the likely effectiveness of the contemplated searches.  Be sure to explain why each matter. 

I am not asking you to assess or amend the agreed-upon queries.  I am asking what needs to be known about the index and search tool to ascertain if the queries will work as expected.

EXAMPLE QUESTION B: The article, A Bill of Rights for E-Discovery included the following passage:

I am a requesting party in discovery.

I have duties.

I am obliged to: …

Work cooperatively with the producing party to identify reasonable and effective means to reduce the cost and burden of discovery, including, as appropriate, the use of tiering, sampling, testing and iterative techniques, along with alternatives to manual review and keyword search.

Describe how “tiering, sampling, testing and iterative techniques, along with alternatives to manual review and keyword search” serve to reduce the cost and burden of e-discovery.  Be sure to make clear what each term means.

It’s been an excellent semester and a pleasure for me to have had the chance to work with a bright bunch.  Thank you for your effort!  I’ve greatly enjoyed getting to know you notwithstanding the limits imposed by the pandemic and Mother Nature’s icy wrath.  I wish you the absolute best on the exam and in your splendid careers to come.  Count me as a future resource to call on if I can be of help to you.  Best of Luck!   Craig Ball

APPENDIX

Twenty-One Key Concepts for Electronically Stored Information

  1. Common law imposes a duty to preserve potentially relevant information in anticipation of litigation.
  2. Most information is electronically stored information (ESI).
  3. Understanding ESI entails knowledge of information storage media, encodings and formats.
  4. There are many types of e-storage media of differing capacities, form factors and formats:
    a) analog (phonograph record) or digital (hard drive, thumb drive, optical media).
    b) mechanical (electromagnetic hard drive, tape, etc.) or solid-state (thumb drive, SIM card, etc.).
  5. Computers don’t store “text,” “documents,” “pictures,” “sounds.” They only store bits (ones or zeroes).
  6. Digital information is encoded as numbers by applying various encoding schemes:
    a) ASCII or Unicode for alphanumeric characters.
    b) JPG for photos, DOCX for Word files, MP3 for sound files, etc.
  7. We express these numbers in a base or radix (base 2 binary, 10 decimal, 16 hexadecimal, 60 sexagesimal). E-mail messages encode attachments in base 64.
  8. The bigger the base, the smaller the space required to notate and convey the information.
  9. Digitally encoded information is stored (written):
    a) physically as bytes (8-bit blocks) in sectors and partitions.
    b) logically as clusters, files, folders and volumes.
  10. Files use binary header signatures to identify file formats (type and structure) of data.
  11. Operating systems use file systems to group information as files and manage filenames and metadata.
  12. Windows file systems employ filename extensions (e.g., .txt, .jpg, .exe) to flag formats.
  13. All ESI includes a component of metadata (data about data) even if no more than needed to locate it.
  14. A file’s metadata may be greater in volume or utility than the contents of the file it describes.
  15. File tables hold system metadata about the file (e.g., name, locations on disk, MAC dates): it’s CONTEXT.
  16. Files hold application metadata (e.g., EXIF geolocation data in photos, comments in docs): it’s CONTENT.
  17. File systems allocate clusters for file storage, deleting files releases cluster allocations for reuse.
  18. If unallocated clusters aren’t reused, deleted files may be recovered (“carved”) via computer forensics.
  19. Forensic (“bitstream”) imaging is a method to preserve both allocated and unallocated clusters.
  20. Data are numbers, so data can be digitally “fingerprinted” using one-way hash algorithms (MD5, SHA1).
  21. Hashing facilitates identification, deduplication and de-NISTing of ESI in e-discovery.

The Great Pandemic Leap

Much has been made of the “Great Pandemic Leap” by law firms and courts. Pandemic proved to be, if not the mother of invention, at least the mother****** who FINALLY got techno tardy lawyers to shuffle forward. The alleged leap had nothing to do with new technology. Zoom and other collaboration tools have been around a long time. In fact, April 21, 2021 was Zoom’s 10th Birthday! Happy Birthday, Zoom! Thanks for being there for us.

No, it wasn’t new technology. The ‘Ten Years in Ten Weeks’ great leap was enabled by compulsion, adoption and support.

“Compulsion” because we couldn’t meet face-to-face, and seeing faces (and slides and white boards) is important.
“Adoption” because so many embraced Zoom and its ilk that we suddenly enjoyed a common meeting place.
“Support” because getting firms and families up and running on Zoom et al. became a transcendent priority.

It didn’t hurt that schools moving to Zoom served to put a support scion in many lawyers’ homes and, let’s face it Atticus, the learning curve wasn’t all that steep. Everyone already had a device with camera and microphone. Zoom made it one-click easy to join a meeting, even if eye-level camera positioning and unmuting of microphones has proven more confounding to lawyers than the Rule Against Perpetuities.

For me, the Great Leap manifested as the near-universal ability to convene on a platform where screen sharing and remote control were simple. I’ve long depended on remote control and screen sharing tools to access machines by Remote Desktop Protocol (RDP) or TeamViewer (not to mention PCAnywhere and legacy applications that made WFH possible in the 90s and aughts). But, that was on my own machines. Linking to somebody else’s machine without a tech-savvy soul on the opposite end was a nightmare. If you’ve ever tried to remotely support a parent, you understand. “No, Mom, please don’t click anything until I tell you. Oh, you already did? What did the error message say? Next time, don’t hit ‘Okay” until you read the message, please Mom.

E-discovery and digital forensics require defensible data identification, preservation and collection. The pandemic made deskside reviews and onsite collection virtually impossible, or more accurately, those tasks became possible only virtually. Suddenly, miraculously, everyone knew how to join a Zoom call, so custodians could share screens and hand over remote control of keyboard and mouse. I could record the sessions to document the work and remotely load software (like iMazing or CoolMuster) to preserve and access mobile devices. Remote control and screen sharing let me target collection efforts based on my judgment and not be left at the mercy of a custodian’s self-interested actions. Custodians could observe, assist and intervene in my work or they could opt to walk away and leave me to do my thing. I was “there,” but less intrusively and spared the expense and hassle of travel. I could meet FRCP 26(g) obligations and make a record to return to if an unforeseen issue arose.

In my role as investigator, there’s are advantages attendant to being onsite; e.g., I sometimes spot evidence of undisclosed data sources. But, weighed against the convenience and economy of remote identification and collection, I can confidently say I’m never going back to the old normal when I can do the work as well via Zoom.

Working remotely as I’ve described requires a passing familiarity with Zoom screen sharing, if only to be able to talk others through unseen menus. As Zoom host, you will need to extend screen sharing privileges to the remote user. Do this on-the-fly by making the remote user a meeting co-host, (click “More” alongside their name in the Participants screen). Alternatively, you can select Advanced Sharing Options from the Share Screen menu. Under “Who can Share?” choose “All Participants.”

To acquire control of the remote user’s mouse and keyboard, have the remote user initiate a screen share then open the View Options dropdown menu alongside the green bar indicating you’re viewing a shared screen. Select “Request Remote Control,” then click “Request” to confirm. The remote user will see a message box seeking authorization to control their screen. Once authorized, click inside the shared screen window to take control of the remote machine.

If you need to inspect a remote user’s iPhone or iPad, Zoom supports sharing those devices using a free plugin that links the mobile device over the same WiFi connection as the Zoom session. To initiate an iPhone/iPad screen share, instruct the remote user to click Screen Share and then select the iPhone/iPad icon at right for further instructions. Simpler still, have the remote user install Zoom on the phone or pad under scrutiny and join the Zoom session from the mobile device. Once in the meeting, the remote user screen shares from the session on the mobile device. Easy-peasy AND it works for Android phones, too!

So Counselor, go ahead and take that victory lap. Whether you made a great leap or were dragged kicking and screaming to a soupçon of technical proficiency, it’s great to see you! Hang onto those gains, and seek new ways to leverage technology in your practice. Your life may no longer depend on it, but your future certainly does.

Life Lessons from E-Discovery

Eight years ago, my old friends and Über-thought leaders Bill Hamilton and George Socha created an e-discovery conference targeting an underserved constituency: lawyers without the luxury of an e-discovery practice group or litigation support staff. Regular folks. The always enlightening and enjoyable University of Florida E-Discovery Conference has been a fixture on my speaking calendar for years. This year, the pandemic foreclosed the customary face-to-face confab in central Florida, so we convened virtually– just Bill, George, me and 3,000 of our closest friends. Seriously, the turnout was astounding: 3,058 unique attendees! BRAVO!

My contribution was modest–fifteen minutes chatting about Life Lessons from E-Discovery. Here’s what I shared:

Thirty years ago, Robert Fulgham published a bestseller called, “All I Really Need To Know I Learned In Kindergarten.”  It posited that the simple lessons we gained as children can guide us all our lives.

The lessons were things like:

  • Share everything.
  • Play fair.
  • Don’t hit people.
  • Put things back where you found them.
  • Clean up your own mess.
  • Don’t take things that aren’t yours.
  • Say you’re sorry when you hurt somebody.
  • Flush and wash your hands.

That last one proved especially useful of late!

Fulgham’s point was that childish precepts extrapolate well to our adult lives, to relationships, business, government, really to everything.

I’ve been a student and teacher of electronic evidence for forty years, so when Professor Hamilton asked me to say a few words today, I wondered what I’d gleaned from electronic discovery that might yield life lessons like those kindergarten rules.  Many things came to mind.  Things like:

They all say basically the same thing: treat others with respect and courtesy.  I commend them all to you, but the shameful truth is I’ve violated enough of those precepts that I feel unworthy to preach their indisputable value.

Instead, I sought five precepts uniquely suited to e-discovery, five lessons I’ve acquired and come to believe in through hard experience.

I should confess that my point of view is a jaundiced and cynical one.  As a special master, Courts bring me in when discovery’s gone off the rails, often when sanctions are in the offing.  In my world, incompetence and deceit are the norm.  So, if my lessons strike you as too obvious or too simple, I’m thrilled to hear it.

The first rule, and really the most fundamental is:

Tell the truth based on fact. 

Albert Einstein said, “Imagination is more important than information.”  Sorry, Al, not in e-discovery.

When it comes to e-discovery, information is more important than imagination.  In e-discovery, information is everythingMeasurement trumps opinion.  Your gut sense that the other side is withholding evidence is fascinating, but it’s not proof.  Your certainty that the client has no responsive data is just baloney without a competent search. 

If we are to be credible professionals, We must concede what we don’t know, share what we do know and recognize that cooperation isn’t a hallmark of weakness but a harbinger of strength.  Bluffing is fine at the poker table, but it will kill you in Court.  Your word—your credibility—your reputation for honesty is worth more than all your education and skill.

And a variant on number one is:

Tell the truth, no matter the consequences.

I’ve written hundreds of articles about e-discovery and forensics.  Colleagues ask, “Aren’t you afraid something you wrote will be used against you in cross-examination?”  I tell them I’ve never worried about that because I’ve told the truth as I knew it in everything I wrote.  Sometimes I was mistaken, but I was never false.  So, I don’t have to remember what I said.  I just hold on to what I know to be true.

If someone wants to cite me to impeach me, bring it on!  I’ll take them from punched cards to magnetic media to solid state storage, from big iron mainframes to client-server to the Cloud.  I’ll share my conviction that learning never ends, and, yes, mistakes happen along the way.  The measure that matters is how we own our errors.  If we stick to the truth, we can gain more from failure than success.

My second precept is just one word.  A century ago, IBM’s founder Thomas J. Watson put a word on an easel at a business meeting.  It read “THINK.”  That’s still IBM’s slogan, and it’s what I want to shout at lawyers who serve ridiculous requests for production or file boilerplate objections.

THINK!  I want to stamp it on the foreheads of lawyers who just don’t think about where evidence is likely to be found or sensible ways to find it.  I know lawyers to be first-class thinkers; so, it’s maddening when good lawyers take off their thinking caps in e-discovery.  Any lawyer can learn enough tech to master the “E” in e-discovery.  Anyone.  All of us on this conference faculty are convinced of it.  It’s what gets us out of bed each day.

But to do it, lawyers must cast aside doubt and turn off the parts of their brains that tell them they’re too old, too busy or just too much a lawyer to learn something new.  Conferences like this one help—thank you for being here–but it takes more than a few hours on Zoom or a big litigation budget to become competent to serve your clients in the realm of electronic evidence.  It requires a willingness to fearlessly embrace an unfamiliar discipline–to learn a second thing. 

It takes a commitment to study, question, pursue and explore information about information and a commitment to THINK, THINK, THINK about how people communicate, what tools and software they use, their language, what metadata matters and where data lives. So, please don’t think you can’t learn it, or worse, that you need not learn it.  You can and you must.  Nothing less than the future of the civil justice system depends on it.  Of course, you’re here pursuing greater expertise, so I’m preaching to the choir.

My third lesson is: Have a plan.

In my thirties, I read Robert Caro’s epic biography of Robert Moses.  Moses was an urban planner who reshaped New York.  Robert Moses’ massive projects got built.  The secret to his success was that, where others came to planning meetings with ideas, Robert Moses arrived with blueprints and budgets.  He was a man with a plan.

In e-discovery, lawyers are brilliant at articulating objections, at saying what clients won’t do; but what’s often missing is a well-reasoned plan for what clients will do.  When you come with a plan, it’s clear you thought about what must be done.  A practical plan demonstrates a commitment to progress.  A reasonable plan forces the other side to work within your framework.  Judges love it when lawyers have a plan.  The Rules of discovery are written to better serve litigants with a plan. 

The e-discovery plan is a protocol. E-discovery demands a good protocol and success in e-discovery requires that lawyers know which features of a protocol are crucial and which are negotiable.  So, always show up with a plan.

Number four is: Never attribute to guile that which can be explained by incompetence.

I borrowed and adapted this one from my late friend, Browning Marean, who had a huge store of wise sayings.  In fairness, Browning borrowed it from Robert Heinlein, whose “Heinlein’s Razor” reads, “never attribute to malice that which is adequately explained by stupidity.”  Because we know lawyers aren’t stupid, I prefer to term it a shortfall in competence.

When a party messes up in e-discovery, the victims of failure often cry “foul” and suspect an intent to deprive them of the evidence.  In my experience, intent,–I’m calling it “guile,”– when it’s genuine, tends to manifest as efforts to conceal the screw up—it’s the cover up that kills you, not the failure itself.  Most screw ups are just… screw ups.  Always avoidable, sometimes reprehensible, but more often the result of apathy than antipathy. Maybe that’s why the last set of Rules amendments shielded parties from serious sanctions for mere incompetence.  In my mind, the decision to tie judges’ hands when disciplining incompetence and spoliation was a poor one.  A mistakenly political one.  Fear of sanctions was the prime driver of the e-discovery revolution.  It was the reason lawyers and companies came around and started preserving and producing ESI.  Sanctions were the driver of competence.  Sadly, we never had much of a carrot, and now they’ve taken away the stick.

And my final lesson is one of human nature:

Remember that Courts guard their authority more scrupulously than your client’s rights.

What I mean by this, with no disrespect to the judges on this faculty or listening, is a recognition born of experience, that a party is considerably more likely to be disciplined for violating a court order than for failing to fulfill obligations to an opponent. The takeaway is that an effort to secure sanctions is a marathon, not a sprint. 

You must take the time and make the effort to mature motions to compel or for protection into explicit orders of the Court.  A court cannot function if its orders are ignored with impunity.  So, if sanctions are your objective, position the failure to produce to be more than simply a transgression of your client’s rights, put it in the posture of something that threatens the court’s sovereignty.

And while we’re talking sanctions, never forget that sanctions are exceptional remedies.  Courts hate to sanction parties or counsel.  Though the threat of sanctions carried along in a case can be a useful tactic, seeking sanctions to patch a weak case is a fool’s errand.  Discovery is a mechanism to gather evidence to make your case, nothing more or less than that.

Those are my five.  I expect you have some great ones of your own.  Next year, I’d like to hear you sharing yours right here.  Even better, let’s all meet in Gainesville. Share our ideas.  Break bread and toast a return to normalcy.   That’s an invitation to join the e-discovery community.  Being part of it has been one of the great delights of my professional life. I’ve made wonderful friends that way.  You will, too.

Be well and thank you.

Can a Producing Party Refuse to Produce Linked Attachments to E-Mail?

A fellow professor of e-discovery started my morning with a question. He wrote, “In companies using Google business, internal email ‘attachments’ are often linked with a URL to documents on a Google drive rather than actually ‘attached’ to the email…. Can the producing party legally refuse to produce the document as an attachment to the email showing the family? Other links in the email to, for example, a website need not be produced.

I replied that I didn’t have the definitive answer, but I had a considered opinion. First, I challenged the assertion, “Other links in the email to, for example, a website need not be produced.”

Typically, the link must be produced because it’s part of the relevant and responsive, non-privileged message.  But, the link is just a pointer, a path to an item, and the discoverability of the link’s target hinges upon whether the (non-privileged) target is responsive AND within the care, custody or subject to the control of the producing party.

For the hypothetical case, I assume that the transmittal is deemed relevant and the linked targets are either relevant by virtue of their being linked to the transmittal or independently relevant and responsive.  I also assume that the linked target remains in the care, custody or subject to the control of the producing party because it has a legal and practical right of access to the repository where the linked target resides; that is, the producing party CAN access the linked item, even if they would rather not retrieve the relevant, responsive and non-privileged content to which the custodian has linked the transmittal.

If the link is not broken and the custodian of the message could click the link and access the linked target, where is the undue burden and cost?  Certainly I well know that collection is often delegated to persons other than the custodian, but shouldn’t we measure undue burden and cost from the standpoint of the custodian under the legal duty to preserve and produce, NOT from the perspective of a proxy engaged to collect, but lacking the custodian’s ability to collect, the linked target? Viewed in this light, I don’t see where the law excuses the producing party from collecting and producing the linked target

The difficulty in collection cited results from the producing party contracting to delegate storage to a third-party Cloud Provider, linking to information relegated to the Cloud Provider’s custody.  In certain respects, it’s like the defendant in Columbia Pictures v. Bunnell, who put a contractor (Panther) in control of the IP addresses of the persons trading pirated movies via the defendant’s platform.  Just because you enlist someone to keep your data on your behalf doesn’t defeat your ultimate right of control or your duty of production. 

Having addressed duty, let’s turn to feasibility, which is really what the fight’s about. 

Two key issues I see are: 

1. What if the link is broken by the passage of time?  If the target cannot be collected after reasonable efforts to do so, then it may be infeasible to effect the collection via the link or via pairing the link address to the addresses of the contents of the repository (as by using the Download-Link-Generator tool I highlighted here).  If there is simply no way a link created before a legal hold duty attached can be tied to its target, then you can’t do it, and the Court can’t order the impossible.  But, you can’t just label something “impossible” because you’d rather not do it.  You must make reasonable efforts and you must prove infeasibility. Courts should look askance at claims of infeasibility asserted by producing parties who have created the very situations that make it harder to obtain discovery.

2. What if the content of the target has changed since the time it was linked?  This is where the debate gets stickier, and where I have little empathy for a producing party who expects to be excused from production on the basis that it altered the evidence.  If the evidence has changed to the point where its relevance is in question because it may have been materially changed after linking, then the burden to prove the material change (and diminished relevance) falls on the producing party, not the requesting party.  Else, you take your evidence as you find it, and you produce it as it exists at the time of preservation and collection.  The possibility that it changed goes to its admissibility and weight, not to its discoverability.

I hope you agree my analysis is sound. To paraphrase Abraham Lincoln, you cannot murder your parents and then seek leniency because you’re an orphan. The problem is solvable, but it will be resolved only when Courts supply the necessary incentive by ordering collection and production. Integrating a hash value of the target within the link might go a long way to curing this Humpty-Dumpty dilemma; then, the target can be readily identified AND proven to be in the same state as when the link was created.

While we are at it, embedded links should be addressed from the standpoint of security and ethics. If a producing party supplies a message or document with a live link and opposing counsel’s clicks on the link exposing information not meant to be produced, whose head should roll there? If a party produces a live link in an email, is it reasonable to assume that the target was delivered, too? To my mind, the link is fair game, just as the attachment would be had it been embedded in the message. Electronic delivery is delivery. We have rules governing inadvertent production of privileged content, but not for the scenario described.