“There’s No Better Rule”

“Take nothing on its looks; take everything on evidence. There’s no better rule.”  I quote this line from Charles Dickens’ Great Expectations at the end of all my emails.  It’s my guiding light.  Sure, how things look matters, but how things truly are matters more.  At least it should be that way. 

I reflect on all this while listening to a webinar presented by my friends, Doug Austin and Kelly Twigger, and moderated by Brett Burney.  They discussed so-called Modern Attachments or, as they prefer to call them, Hyperlinked Files.  In a nutshell, Modern Attachments (as Microsoft calls them) are files that are stored in the Cloud and accessed by links transmitted within an email as distinguished from documents embedded within the transmitting e-mail and thus traveling with the e-mail rather than being retrieved by the recipient clicking on a hyperlink.  The debate about the extent of the duty to preserve, collect and produce these modern attachments rages on, and I don’t post here to rehash that back-and-forth.  My purpose is to tackle some misinformation advanced as a basis to exclude modern attachments from the reach of discovery.

Many who paint dealing with Modern Attachments as infeasible or fraught with risk posit that Modern Attachments tend to be collaborative documents or documents that have gone through edits after transmittal.  They argue that they shouldn’t have to produce Modern Attachments due to uncertainty over whether the document collected during discovery differs significantly from how it existed at the time of transmittal. I don’t think that a good argument against collection and review, but once more, not my point here.

My point is that we need to stop asserting that these Modern Attachments are routinely altered after transmittal without evidence of the incidence of alterationWe should never guess at what we can readily measure.

Based on my experience, most modern attachments (e.g., 85-95%) are not altered after transmittal. Nevertheless, my personal observations mean little in the face of solid data revealing the percentage of Modern Attachments altered after transmittal.  We can measure this.  The last modified dates of Modern Attachments can be compared to their transmittal dates, either en masse or through appropriate sampling.  This will allow us to know the incidence of post-transmittal alteration based on hard evidence rather than assumptions or intuition.  I expect the incidence will vary between disciplines and corporate cultures, but that, too, is worth measuring.

Why hasn’t this been done?  A suspicious mind would conclude that those holding the data–who also happen to be the ones resisting the obligation to produce Modern Attachments–don’t want to know the metrics. Less archly, maybe they simply haven’t taken the time to measure, as guessing is easier. As they say in The Man Who Shot Liberty Valance, “This is the West, sir. When the legend becomes fact, print the legend.”  Facts are inconvenient. They’re sticking with the legend.

But it’s time to quit that.  It’s time to take everything on evidence.

AI Drawing Programs: ChatGPT Versus PlaygroundAI

Of late, I’ve come to use AI generated imagery in lieu of my own work or open-source and licensed works as a source for digital storytelling.  My friend, blogger Doug Austin (perhaps the hardest working man in e-discovery) has been illustrating his daily blog posts with AI-generated art for quite some time and I kid him now-and-then about the abundance of robots in his illustrations.  I feel besieged by robot imagery.

Last week in San Antonio, I co-presented on AI evidence at a huge annual conclave of family law practitioners.  I’ve been using ChatGPT, Dall-E and Midjourney for image generation, but preparing the new presentation, I kicked the tires of several tools I’d not used before.  One of them impressed me so I thought I’d post to share it.  I think it blows the doors off ChatGPT’s images.  It’s called PlaygroundAI  It lets users create up to 50 images per day at no charge, and up to 1,000 images a day on its Pro plan ($15/month paid monthly, cancel any time).

Continue reading

Doveryai, No Proveryai!

I recently published an AI prompt to run against search terms then get the AI to propose improvements.  Among the pitfalls I’d hoped to expose was the presence of “stop” or “noise” words; terms routinely excluded from search indices.  Searches incorporating stop words fail because terms not in the index won’t be found.  Ensuring your searches don’t include stop words is an essential step in framing effective queries.

To help the AI recognize stop words, the prompt included a list of default stop words for well-known eDiscovery tools.  That is, I thought I’d done that, but what I included in error (and have now replaced) was ChatGPT’s rendition of stop words for the major tools.  I’d made a mental note to check the lists supplied but—DOH!—I plugged it into the prompt and then forgot to do my due diligence.

I was feeling pretty good about the post and getting some nice feedback.  Last night, my dear friend and e-discovery Empress Mary Mack commented on the novelty of seeing the various stop word lists broken out in a ready reference.  I think echoes of Mary’s kind comment woke me at 4:00am, my subconscious screaming, “HEY DUMMY!  Did you verify those stop words?  Tell me you didn’t blindly trust an AI?!?”

So, long before sunrise, I was manually checking each stop word list against product websites and—lo and behold—every list was off: some merely incomplete but others not even close. ChatGPT hallucinated the lists, and I failed to do the crucial thing lawyers must do when using AI as a research assistant: Trust but verify.

No harm done, but I share my chagrin here to underscore that you just cannot trust an AI generative large language model to do your research without careful human assessment of the output.  I know this and let it slip my mind.  Last time for that.  I’ve corrected the prompt on my blog and hope I’ve gotten it right.  I post this to remind my readers that AI LLMs are great—USE THEM–but they are no substitute for you.  Doveryai, no proveryai!

AI Prompt to Improve Keyword Search

Twenty years ago, I dreamed up a website where you would submit a list of eDiscovery keywords and queries and the site would critique the searches and suggest improvements to make them more efficient and effective. It would flag stop words, propose alternate spellings, and alert the user to pitfalls making searches less effective or noisy. I even envisioned it testing queries against a benign dataset to identify overly broad terms and false hits.

I believed this tool would be invaluable for helping lawyers enhance their search skills and achieve greater efficiency. Over the years, I tried to bring this idea to life, seeking proposals from offshore developers and pitching it to e-discovery software publishers as a value-add. In the end, a pipe dream. Even now, nothing like it exists.

The emergence of AI-powered Large Language Models like ChatGPT made me think what I’d hoped to bring to life years ago might finally be feasible. I wondered if I could create a prompt for ChatGPT that would achieve much of what I envisioned. So, I dedicated a sunny Sunday morning to playing “prompt engineer,” a whole cloth term for those who craft AI prompts to achieve desired outcomes.

The result was promising, a significant step forward for lawyers who struggle with search queries without understanding why some fail. Most search errors I encounter aren’t subtle. I’ve written about ways to improve lexical search, and the techniques aren’t rocket science, though they require some familiarity with how electronically stored information is indexed and how search syntaxes differ across platforms. Okay, maybe a little rocket science. But if you’re using a tool for critical tasks, shouldn’t you know what it can and cannot do?

Some believe refining keywords and queries is a waste of time, casting keyword search as obsolete. Perhaps on your planet, Klaatu, but here on Earth, lawyers continue using keywords with reckless abandon. I’m not defending that but neither will I ignore lawyers’ penchant for lexical search. Until the cost, reliability, and replicability of AI-enabled discovery improve, keywords will remain a tool for sifting through large datasets. However, we can use AI LLMs right now to enhance the performance and efficiency of shopworn approaches.

Continue reading

Yes, AI is Here. No, You’re Not Gone.

Yesterday, I sought to defend the value of my law school course on E-Discovery & Digital Evidence to a law Dean who readily conceded that she didn’t know what e-discovery was or why it would be an important thing for lawyers to understand.  It was a bracing experience.

My métier has always been litigation, to the point that everyone I work with sits in and around trial practice.  My close colleagues recognize that 90% of what trial lawyers do is geared to discovery and motion practice, and much of that motion practice is prompted by discovery disputes. So, hearing how a tax lawyer and academic viewed litigation was eye-opening, and troubling to the extent it impacts what’s taught to new lawyers.

Do you agree about the centrality of discovery to litigation, Dear Reader?

The Dean shared her sense that discovery is being replaced by AI and that “soon AI will handle the production of relevant information instead of lawyers.”  I replied that I expected the review phase to be abetted or supplanted by AI in the near term—that’s here—but it would be some time before all the tasks that come before review would be fully AI-enabled.

The idea that there are crucial tasks requiring lawyer intervention before review was surprising to her.  For those who don’t manage electronic discovery day-to-day, electronically stored information seems to magically appear in review tools.  But for e-discovery folks, the march through identification, preservation, collection and processing is our path, and we know that no one, and no AI, can undertake an assessment of the evidence without facing the data.

You’ve got to face the evidence to assess the evidence.

That’s axiomatic; but it’s downplayed by those shouting “AI! AI!”  As they say in these parts, “you’ve got to put the hay down where the goats can get it.”  Until AI is embedded in everything, until AI faces the data in every phone, cloud repository, storage medium and database in ways that support discovery, the goats can’t get to the hay.

The evidence in our cases is not a “collection” until it’s collected.  That doesn’t necessarily mean a copy must be made to isolate data of interest, but that remains the prevailing way that a discrete assemblage of potentially responsive ESI is marshaled before it is processed for search and review.  Not until that occurs does the evidence face human or AI review.

Continue reading

Garden Variety: Byte Fed. v. Lux Vending

Tags

,

My esteemed colleagues, Kelly Twigger and Doug Austin, each posted about a recent discovery decision from the Middle District of Florida, case no. 8:23-cv-102-MSS-SPF, styled, Byte Fed., Inc. v. Lux Vending LLC. and decided by United States Magistrate Judge Sean Flynn on May 1, 2024.

Kelly and Doug share their customarily first-rate analyses of the ruling insofar as its finding that the assertion of boilerplate objections serves as a waiver.  The Court spanked defendant, The Cardamone Consulting Group, LLC, for its conduct.  That’s been picked apart elsewhere, and I have nothing to add.  I write here to address a feature of the dispute that no one has discussed (and sadly, neither did the Court), being the nature of the request for production that prompted the boilerplate objection of “vague and incomprehensible.”  We can learn much more from the case than just boilerplate=waiver.

Let’s look at the underlying request:

DOCUMENT REQUEST NO. 7:

All documents and electronically stored information that are generated in applying the search terms below to Your corporate email accounts (including but not limited to the email accounts for Nicholas Cardamone, Daniel Cardamone, and Patrick McCloskey):

ByteBitcoin w/s FloridaStanton
ByteFederalBitcoin w/s trademarkBranden w/3 Tawil
Byte FederallawsuitBrandon w/3 Mintz
most w/5 trustedScott w/3 BuchananDKI
Google w/s trademarkconfusion or confusedDynamic w/5 keyword

In its Motion to Compel, Plaintiff calls this request “clear on its face, and … a garden-variety type of request for production in connection with narrowly tailored search terms.”  The Plaintiff adds, “[y]et during the parties’ meet-and-confer, and although Cardamone’s counsel claimed that she was familiar with electronic discovery, the assertion was that her client – a company that has purportedly generated hundreds of millions of dollars in connection with online advertising and electronic data – ‘did not understand what to do.’”

So, Dear Reader, would you understand what to do? You’re steeped in electronic discovery—that’s why you’ve stopped by—but is the request clear, narrowly tailored and “garden-variety” such that we can apply it to a proper production workflow?  A few points to ponder:

1. There’s nothing in the Federal Rules of Civil Procedure that prohibits a request to run specific queries against databases, and email accounts are databases.  Rule 34 requires only that the request “describe with reasonable particularity each item or category of items to be inspected.” 

Conventional requests are couched in language geared to relevance; that is, the requests seek documents and ESI about a topic.  Counsel must then apply the law and the facts to guide clients in identifying responsive information.  Counsel reviews the information gathered and decides whether it’s responsive or should be withheld as a matter of right or privilege.

Over time, the notion took hold that sifting through electronically stored information was unduly burdensome, so opposing parties were expected to work together to fashion queries–“search terms” –to narrow the scope of review.  These keyword negotiations run the gamut from laughable to laudable. They’re duels between counsel frequently unarmed with knowledge of the search tools and processes or of the data under scrutiny.  In short, they use their ginormous lawyer brains to guess what might work if the digital world were as they imagine it to be.

Here, the plaintiff cuts to the chase, eschewing a request couched in relevance in favor of asking that specific searches be run: half of them Boolean constructs employing two types of proximity connectors. 

Was this smart?   You decide.

Continue reading

Surviving a Registration Bomb Attack

Tags

, , , ,

It started just after 7:00 last night.  My mailbox swelled with messages confirming I’d subscribed to websites and newsletters around the world.  Within an hour, I’d received over 2,000 such messages, and they kept pouring in until I’d gotten 4,000 registration confirmations by 11:00pm. After that, the flood slowed to a trickle.

I was the victim of a registration bomb attack, a scary experience if you don’t grasp what’s happening or know how to protect yourself.  Fortunately, it wasn’t my first rodeo. 

During a similar attack a couple of years ago, I was like a dog on the Fourth of July–I didn’t know what was happening or how to deal with it.  But this time, my nerves weren’t wracked: I knew what was afoot and where the peril lay.

Cybersecurity is not my principal field of practice, but it’s a forensics-adjacent discipline and one where I try to keep abreast of developments.  So, much like a trial lawyer enjoying the rare chance to serve on a jury, being the target of a cyberattack is as instructive as inconvenient.  

While a registration bomb attack could be the work of a disgruntled reader (Hey! You can’t please everybody), more often they serve to mask attacks on legitimate accounts by burying notices of password resets, funds transfers or fraudulent credit card charges beneath a mountain of messages.  So, yes, you should treat a registration bomb attack as requiring immediate vigilance in terms of your finances.  Keep a weather eye out for small transfers, especially deposits into a bank account as these signal efforts to link your account to another as prelude to theft.  Likewise, look at your credit card transactions to ensure that recent charges are legitimate.  Finally—and the hardest to do amidst a deluge of registration notices—look for efforts to change credentials for e-commerce websites you use like Walmart.com or Amazon.com.

A registration bomb attack is a powerful reminder of the value of always deploying multifactor authentication (MFA) to protect your banking, brokerage and credit card accounts.  Those extra seconds expended on secure logins will spare you hours and days lost to a breach.  With MFA in place, an attacker who succeeds in changing your credentials won’t have the access codes texted to your phone, thwarting efforts to rob you.

The good news is that, if you’re vigilant in the hours a registration bomb is exploding in your email account and you have MFA protecting your accounts, you’re in good shape.

Now for the bad news: a registration bomb is a distributed attack, meaning that it uses a botnet to enlist a legion of unwitting, innocent participants—genuine websites—to do the dirty work of clogging your email account with registration confirmation requests.  Because the websites emailing you are legitimate, there’s nothing about their email to trigger a spam filter until YOU label the message as spam. Unfortunately, that’s what you must do: select the attack messages and label each one as spam.  Don’t bother to unsubscribe to the registrations; just label the messages as spam as quickly as you can. 

This is a pain. And you must be attuned to the potential to mistakenly blacklist senders whose messages you want at the same time you’re squashing the spam messages you don’t want and scanning for password change notices from your banks, brokers and e-commerce vendors.  It’s easier when you know how to select multiple messages before hitting the “spam” button (in Gmail, holding down the Shift key enables you to select a range of messages by selecting the first and last message in the range).  Happily, the onslaught of registration spam will stop; thousands become hundreds and hundreds become dozens in just hours (though you’ll likely get stragglers for days).

Registration bombing attacks will continue so long as the web is built around websites sending registration confirmation messages—a process ironically designed to protect you from spam.   If you’ve deployed the essential mechanisms to protect yourself online, particularly strong, unique passwords, multifactor authentication and diligent review of accounts for fraudulent transactions, don’t panic; the registration bomb will be no more than a short-lived inconvenience.  This, too, shall pass.

ChatGPT Proves a Mediocre Law Student

I recently spent a morning testing ChatGPT’s abilities by giving it exercises and quizzes designed for my law and computer science graduate students. Overall, I was impressed with its performance, but also noticed that it’s frequently wrong but never in doubt: a mechanical mansplainer!

If you’re asking, “What is ChatGPT,” I’ll let it explain itself: “ChatGPT is a large language model developed by OpenAI. It is a type of machine learning model called a transformer, which is trained to generate text based on a given prompt. It is particularly well-suited to tasks such as natural language processing, text generation, and language translation. It is capable of understanding human language and generating human-like text, which makes it useful for a wide range of applications, such as chatbots, question-answering systems, and text completion.” I’d add that it’s the first artificial intelligence tool that anyone can play with, and it’s generating enormous excitement.

In terms of its legal knowledge, ChatGPT performed like a mediocre law student, performing well on essay questions and drafting exercises, but struggling with multiple choice questions. I couldn’t assign it a definitive score on the 25-question quiz as some questions required image analysis and I didn’t see a way to include illustrations in the chat interface. However, it performed poorly enough on the quiz for me to conclude that a passing score of 70 out of 100 points was out of reach.

One example of a question it struggled with was:

Which of the following is NOT an essential metadata value (a “metadata must”) to preserve and produce for a file?

  1. Custodian
  2. File Name
  3. Last Modified Date and Time
  4. Last Accessed Date and Time

The correct answer is Last Accessed Date and Time.  Class readings underscore the unreliability of Last Accessed dates and times and I supply a table of “Metadata Musts” for preservation and production.  Presumably trained on other sources, ChatGPT did not seem to understand the importance of a proper chain of custody for electronic evidence and replied:

Custodian is not considered an essential metadata value to preserve and produce for a file. The other options (File Name, Last Modified Date and Time, Last Accessed Date and Time) are all considered essential metadata values that should be preserved and produced for a file in order to provide important information about the file and its history.

Since it wasn’t performing well on my quiz, I asked,

Can you suggest some multiple choice quiz questions for students studying the fundamentals of electronic discovery and electronic data storage?

The first suggestion was:

What is the process of collecting, processing, and producing electronically stored information (ESI) in response to a legal request called?

a. Electronic discovery

b. Digital forensics

c. E-discovery

d. Data retrieval

In ChatGPT’s formulation “Electronic discovery” is the right response and “E-discovery” is wrong! 

Another suggestion:

What is the most common format for ESI production?

a. PDF

b. TIFF

c. Native file format

d. ASCII

That’s a challenging question because, as much as I’ve focused on and written about forms of production, I’ve not come across metrics establishing the most common format.  In supplying an answer key, ChatGPT identified Native file format as the correct answer, though I would have guessed “TIFF.”  I certainly wouldn’t have included ASCII as a choice.

Here are a few other quiz questions suggested by ChatGPT.  I’ve bolded what ChatGPT defined as the correct response and, in each instance, I’d conclude it’s either not the best answer or the question is so poorly constructed as to defy characterizing any single answer as the right one:

What is the process of identifying and eliminating irrelevant or privileged ESI called?

a. Culling

b. Filtering

c. Redaction

d. Processing

What is the process of reviewing ESI to identify relevant information called?

a. Review

b. Analysis

c. Coding

d. Searching

What is the process of converting ESI from its original format to a reviewable format called?

a. Conversion

b. Processing

c. Extraction

d. Production

What is the process of creating a backup copy of ESI for preservation purposes called?

a. Backup

b. Imaging

c. Archiving

d. Duplication

What is the process of destroying or disposing of ESI after it is no longer needed called?

a. Deletion

b. Erasure

c. Destruction

d. Purging

Encouraged that the law faculty might be safe for another semester, I turned ChatGPT to a drafting exercise:

You are to develop the best checklist of potential data sources that are candidates for legal preservation in any engagement—good enough that you’ll want to keep it to use in your own work. It does NOT have to be entirely original, and I encourage you to start with the best existing checklist(s) you can find and then make updates and improvements. Certainly, you should seek to add missing sources, but you may also see a better way to organize and present the content. Remember that this checklist is a list of sources and varieties not methods and, crucially, it should serve to enhance the completeness of the personal data map you will create (though it should NOT be limited to the scope of your personal digital footprint alone). Thus, generic checklist items like “Online sources” or “Local storage” won’t be sufficiently granular. Be sure to credit your sources and format it as a checklist.

ChatGPT responded:

Data Preservation Checklist

  1. Social media accounts (Facebook, Twitter, Instagram, LinkedIn, etc.)
  2. Email accounts (Gmail, Outlook, Yahoo, etc.)
  3. Cloud storage accounts (Google Drive, Dropbox, OneDrive, etc.)
  4. Mobile device data (text messages, call logs, contacts, etc.)
  5. Computer files (documents, photos, videos, etc.)
  6. Internet browsing history and cache
  7. Online marketplace accounts (eBay, Amazon, Etsy, etc.)
  8. Online banking and financial accounts (Paypal, bank accounts, credit cards, etc.)
  9. GPS location data
  10. Smart home device data (Amazon Echo, Google Home, etc.)
  11. Wearable device data (Fitbit, Apple Watch, etc.)
  12. Gaming accounts (Xbox Live, PlayStation Network, Steam, etc.)
  13. Voicemail and answering machine messages
  14. Instant messaging and chat logs (WhatsApp, Messenger, etc.)
  15. Social media direct messages and private messages
  16. Online dating profiles and messages
  17. Online forum and message board posts
  18. Online video and audio recordings (YouTube, Vimeo, etc.)
  19. Online photo and image sharing accounts (Flickr, Imgur, etc.)
  20. Online document collaboration and sharing accounts (Google Docs, Microsoft Office 365, etc.)

Source: Adapted from “Digital Preservation Checklist” by the National Archives and Records Administration (NARA) (https://www.archives.gov/digital-preservation/checklist) and “Digital Preservation Checklist for Individuals” by the Digital Preservation Coalition (https://www.dpconline.org/advice/digital-preservation-checklist-for-individuals)

Not bad, and certainly typical of what I get from students.  Interestingly, both citations are dead links, so unacceptable from my students, and (arguably) ChatGPT simply ignored the requirement to format its answer as a checklist. 

I had ChatGPT answer the question a second time sans the mandate that it credit sources. I was struck by the differences in the lists supplied and the disclaimer now appended:

  1. Social media accounts (e.g. Facebook, Twitter, Instagram, LinkedIn)
  2. Email accounts (e.g. Gmail, Yahoo, Outlook)
  3. Cloud storage accounts (e.g. Google Drive, Dropbox, iCloud)
  4. Online shopping accounts (e.g. Amazon, eBay, Etsy)
  5. Financial accounts (e.g. bank accounts, credit card accounts, PayPal)
  6. Mobile phone data (e.g. call logs, text messages, contacts)
  7. Computer hard drives
  8. USB drives and external hard drives
  9. Social media messaging apps (e.g. WhatsApp, iMessage, Messenger)
  10. Online forums and discussion boards
  11. Online dating profiles
  12. Online gaming accounts
  13. Web browsing history and cookies
  14. Internet service provider records
  15. GPS data (e.g. location data from mobile devices)
  16. Smart home device data (e.g. Amazon Echo, Google Home)
  17. Vehicle tracking data (e.g. OnStar, Tesla)
  18. Medical records (e.g. electronic health records)
  19. Educational records (e.g. transcripts, grades)
  20. Employee records (e.g. HR files, payroll records)

Note: This checklist is not exhaustive and may vary depending on the specific engagement or case. It is important to consult with legal and technical experts to determine the most relevant and appropriate data sources for preservation. Additionally, it is important to consider any privacy laws and regulations when collecting and preserving personal data.

I asked ChatGPT to identify “the most important legal opinions I should assign my students to read that will help them better understand electronic discovery in the United States” and it responded:

  1. Zubulake v. UBS Warburg LLC, 217 F.R.D. 309 (S.D.N.Y. 2003) – This case established the “Zubulake standards,” which are widely used as a guide for electronic discovery in the United States. It lays out the obligations of parties to preserve and produce electronically stored information (ESI) and the sanctions that can be imposed for failure to comply.
  2. Victor Stanley, Inc. v. Creative Pipe, Inc., 250 F.R.D. 251 (D. Md. 2008) – This case discusses the duty to preserve ESI and the consequences of spoliation (destruction or alteration of evidence). It also addresses the standard for imposing sanctions for spoliation.
  3. The Sedona Conference® Commentary on Achieving Quality in the E-Discovery Process, The Sedona Conference (2008) – This commentary provides guidance on best practices for e-discovery, including the preservation and production of ESI, the use of technology-assisted review (TAR) and predictive coding, and the handling of privileged information.
  4. Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182 (S.D.N.Y. 2012) – This case addresses the use of TAR and predictive coding in e-discovery, and emphasizes the importance of cooperation and communication between parties when using these technologies.
  5. Morgan Stanley Smith Barney LLC v. Skowron, No. 11-CV-8335 (S.D.N.Y. Mar. 29, 2013) – This case discusses the use of proportionality in e-discovery, and the court’s authority to limit the scope of e-discovery in order to reduce the burden and expense of producing ESI.
  6. In re: Actos (Pioglitazone) Products Liability Litigation, MDL No. 6:11-md-2299 (W.D. La. 2013) – This case addresses the use of advanced analytics and data visualization tools in e-discovery, and highlights the importance of considering the costs and benefits of using these technologies in the e-discovery process.

Except for mischaracterizing a Sedona Conference Commentary as a “legal opinion,” they’re all fine suggestions, yet all are ten-twenty years old. 

I asked ChatGPT a question posed by a student in my last class: “What is the difference between electronic discovery and computer forensics?”  It opined:

Electronic discovery (e-discovery) and computer forensics are both related to the collection and analysis of electronic data, but they serve different purposes.

Electronic discovery is the process of collecting, processing, and producing electronic data in response to a legal request or investigation. It is typically used in civil litigation and is designed to identify and produce relevant electronic evidence that may be used in court. The goal of e-discovery is to find information that is responsive to the legal request and that is not privileged or protected by any other legal exceptions.

Computer forensics, on the other hand, is the process of collecting, preserving, analyzing, and presenting electronic evidence in a manner that is admissible in court. It is typically used in criminal investigations and is designed to identify, recover, and preserve evidence of a crime or other misconduct. The goal of computer forensics is to find and preserve evidence that may be used to prove or disprove a crime or other wrongdoing.

In summary, electronic discovery is focused on finding and producing relevant electronic evidence in response to a legal request, while computer forensics is focused on finding and preserving evidence of a crime or other misconduct. Both processes require the use of specialized tools and techniques to collect and analyze electronic data, but they serve different purposes and are used in different types of legal proceedings.

A good answer.  My response in class focused on the relative accessibility and intelligibility of the ESI we deal with in e-discovery versus digital forensics, and I didn’t tie forensics to criminal investigations because so much of my work in the field has concentrated on civil cases.  But I can’t say I did any better than the AI.  I smiled more.

Finally, I asked who are the most important thought leaders in the field of electronic discovery?  ChatGPT replied:

  1. Ralph Losey: A well-known attorney and author who has written extensively on the topic of electronic discovery and has been recognized as a leading expert in the field.
  2. Craig Ball: A forensic expert and attorney who has played a key role in shaping the laws and practices related to electronic discovery through his work as a consultant, expert witness, and author.
  3. Tom O’Connor: A consultant, speaker, and author who is considered one of the leading experts on the use of technology in the legal field and has been instrumental in developing best practices for electronic discovery.
  4. George Socha: A consultant, speaker, and author who is considered a leading expert in the field of electronic discovery and has been instrumental in the development of industry standards and best practices.
  5. Maura Grossman: A consultant, speaker, and author who is considered one of the leading experts on the use of artificial intelligence and machine learning in electronic discovery and has been instrumental in advancing the field.
  6. E-Discovery Team: A team of experts who have written extensively on the topic of electronic discovery and have been recognized as thought leaders in the field through their work as consultants, speakers, and authors.
  7. Paul J. Lippe: A consultant, speaker, and author who is considered a leading expert in the field of electronic discovery and has been instrumental in the development of industry standards and best practices.
  8. John Simek: A consultant, speaker, and author who is considered one of the leading experts on the use of technology in the legal field and has been instrumental in developing best practices for electronic discovery.
  9. David Horrigan: A consultant, speaker, and author who is considered one of the leading experts on the use of technology in the legal field and has been instrumental in developing best practices for electronic discovery.
  10. Daniel Garrie: A consultant, speaker, and author who is considered a leading expert in the field of electronic discovery and has been instrumental in the development of industry standards and best practices.

Kudos to Ralph Losey for topping the list and garnering a second mention as “E-Discovery Team.”  I wouldn’t have considered Paul J. Lippe a candidate for an e-discovery-specific list, and I can’t very well commend ChatGPT for wokeness.  One woman?  No persons of color?  No jurists?  I’m flattered to be included among the other old, white men, but know well that there are plenty as or more deserving.

Ten Tips for Better ESI Expert Reports

A lawyer I admire asked me to talk to her colleague about expert reports.  I haven’t had that conversation yet, but the request got me thinking about the elements of a competent expert report, especially reports in my areas of computer forensics and digital evidence.  I dashed off ten things I thought contribute to the quality of the best expert reports.  If these were rules, I’d have to concede I’ve learned their value by breaking a few of them.  I’ve left out basic writing tips like “use conversational language and simple declarative sentences.” There are lists of rules for good writing elsewhere and you should seek them out.  Instead, here’s my impromptu list of ten tips for crafting better expert reports on technical issues in electronic discovery and computer forensics:

  1. Answer the questions you were engaged to resolve.
  2. Don’t overreach your expertise.
  3. Define jargon, and share supporting data in useful, accessible ways.
  4. Distinguish factual findings from opinions.
  5. Include language addressing the applicable evidentiary standard.
  6. Eschew advocacy; let your expertise advocate for you.
  7. Challenge yourself and be fair.
  8. Proofread.  Edit.  Proofread again. Sleep on it. Edit again.
  9. Avoid assuming the fact finder’s role in terms of ultimate issues.
  10. Listen to your inner voice.

Most of these are self-explanatory but please permit me a few clarifying comments.

Answer the questions you were engaged to resolve.

My pet peeve with expert reports is that they don’t always address the questions important to the court and counsel.  I’ve seen reports spew hundreds of pages of tables and screenshots without conveying what any of it means to the issues in the case.  Sometimes you can’t answer the questions.  Fine.  Say so.  Other times you must break down or reframe the questions to conform to the evidence.  That’s okay, too, IF it’s not an abdication of the task you were brought in to accomplish.  But, the best, most useful and intelligible expert reports pose and answer specific questions.

Don’t overreach your expertise.

The standard to qualify as an expert witness is undemanding: do you possess specialized knowledge that would assist the trier of fact in understanding the evidence or resolving issues of fact? See, e.g., Federal Rule of Evidence 702.  With the bar so low, it can be tempting to overreach your expertise, particularly when pushed by a client to opine on something you aren’t fully qualified to address.  For example, I’m a certified computer forensic examiner and I studied accounting in college, but I’m not a forensic accountant.  I know a lot about digital forgery, but I’m not a trained questioned document examiner.  These are specialties.  I try to stay in my own lane and commend it to other experts.

Define jargon, and share supporting data in useful, accessible ways.

Can someone with an eighth-grade education and no technical expertise beyond that of the average computer user understand your report?  If not, you’re writing for the wrong audience.  We should write to express, not impress.  I love two-dollar words and the bon mot phrase, but they don’t serve me well when writing reports.  Never assume that a technical term will be universally understood.  If your grandparents wouldn’t know what it means, define it.

Computer forensic tools are prone to generate lengthy “reports” rife with incomprehensible data.  It’s tempting to tack them on as appendices to add heft and underscore how smart one must be to understand it all.  But it’s the expert’s responsibility to act as a guide to the data and ensure its import is clear.  I rarely testify—even by affidavit–without developing annotated demonstrative examples of the supporting data.  Don’t wait for the deposition or hearing to use demonstrative evidence; make points clear in the report.

Too, I’m fond of executive summaries; that is, an up-front, cut-to-the-chase paragraph relating the upshot of the report.

Distinguish factual findings from opinions.

The key distinction between expert and fact witnesses is that expert witnesses are permitted to express opinions that go beyond their personal observation.  A lay witness to a crash may testify to speeds based only upon what they saw with their own eyes.  An accident reconstructionist can express an opinion of how fast the cars were going based upon evidence that customarily informs expert opinions like skid marks and vehicle deformation.  Each type of testimony must satisfy different standards of proof in court; so, to make a clear and defensible record, it’s good practice to distinguish factual findings (“things you saw”) from opinions (“things you’ve concluded based upon what you saw AND your specialized knowledge, training and experience”).  This  naturally begets the next tip:

Include language addressing the applicable evidentiary standard.

Modern jurisprudence deploys safeguards like the Daubert standard to combat so-called “junk science.”  Technical expert opinions must be based upon a sound scientific methodology, viz., sufficient facts or data and the product of reliable principles and methods.  While a court acting as gatekeeper can infer the necessary underpinnings from an expert’s report and C.V., expressly stating that opinions are based upon proper and accepted standards makes for a better record.

Eschew advocacy; let your expertise advocate for you.

Mea culpa here.  Because I was a trial lawyer for three+ decades, I labor to restrain myself in my reporting to ensure that I’m not intruding into the lawyer’s realm of advocacy.  I don’t always succeed.  Even if you’re working for a side, be as scrupulously neutral as possible in your reporting.  Strive to act and sound like you don’t care who prevails even if you’re rooting for the home team.  If you do your job well, the facts will advocate the right outcome.

Challenge yourself and be fair.

My worst nightmare as an expert witness is that I will mistakenly opine that someone committed a bad act when they didn’t.  So, I’m always trying to punch holes in my own theories and asking myself, “how would I approach this if I were working for the other side?”  Nowhere is this more important than when working as a court-appointed neutral expert.  Even if you’d enjoying seeing a terrible person fry, be fair.  You stand in the shoes of the Court.

Proofread.  Edit.  Proofread again. Sleep on it. Edit again.

Who has that kind of time, right?  Still, try to find the time.  Few things undermine the credibility of an expert report like a bunch of spelling and grammatical errors.  Stress and fatigue make for poor first drafts.  It often takes a good night’s sleep (or at least a few hours away from the work) to catch the inartful phrase, typo or other careless error.

Avoid assuming the fact finder’s role in terms of ultimate issues.

Serving as a court Special Master a few years back, I opined that the evidence of a certain act was so overwhelming that the Court should only reach one result.  Accordingly, I ceased investigating the loss of certain data that I regarded as out-of-scope.  I was right…but I was also wrong.  The Court has a job to do and, by my eliding over an issue the Court was obliged to address, the Court had to rule without benefit of what a further inquiry into the missing evidence would have revealed. The outcome was the same, but by assuming the factfinder’s role on an ultimate issue, I made the Court’s job harder.  Don’t do that.

Listen to your inner voice.

In expressing expert opinions, too much certainty—a/k/a arrogance–is as perilous as too much doubt.  Perfect is not the standard, but you should be reasonably confident of  your opinion based on a careful and competent review of the evidence.  If something “feels” off, it may be your inner voice telling you to look again. 

Final Exam Review: How Would You Fare?

It’s nearly finals time for the students in my E-Discovery and Digital Evidence course at the University of Texas School of Law. I just completed the Final Exam Study Guide for the class and thought readers who wonder what a tech-centric law school e-discovery curriculum looks like might enjoy seeing what’s asked of the students in a demanding 3-credit law school course. Whether you’re ACEDS certified, head of your e-discovery practice group or just an e-discovery groupie like me, consider how you’d fare preparing for an exam with this scope and depth. I’m proud of my bright students. You’d be really lucky to hire one of my stars.

E-Discovery – Spring 2021 Final Exam Study Guide

The final exam will cover all readings, lectures, exercises and discussions on the syllabus.
(Syllabus ver. 21.0224 in conjunction with Workbook ver. 21.0214 and Announcements).

  1. We spent a month on meeting the preservation duty and proportionality.  You undertook a two-part legal hold drafting exercise.  Be prepared to bring skills acquired from that effort to bear on a hypothetical scenario.  Be prepared to demonstrate your understanding of the requisites of fashioning a defensible legal hold and sensibly targeting a preservation demand to an opponent.  As well, your data mapping skills should prove helpful in addressing the varied sources of potentially relevant ESI that exist, starting at the enterprise level with The Big Six (e-mail, network shares, mobile devices, local storage, social networking and databases).  Of course, we must also consider Cloud repositories and scanned paper documents as potential sources.
  2. An essential capability of an e-discovery lawyer is to assess a case for potentially relevant ESI, fashion and implement a plan to identify accessible and inaccessible sources, determine their fragility and persistence, scope and deploy a litigation hold and take other appropriate first steps to counsel clients and be prepared to propound and respond to e-discovery, especially those steps needed to make effective use of the FRCP Rule 26(f) meet-and-confer process.  Often, you must act without having all the facts you’d like and rely upon your general understanding of ESI and information systems to put forward a plan to acquire the facts and do so with sensitivity to the cost and disruption your actions may engender.  Everything we’ve studied was geared to instilling those capabilities in you.
  3. CASES: You are responsible for all cases covered during the semester.  When you read each case, you should ask yourself, “What proposition might I cite this case to support in the context of e-discovery?”  That’s likely to be the way I will have you distinguish the cases and use them in the exam.  I refer to cases by their style (plaintiff versus defendant), so you should be prepared to employ a mnemonic to remember their most salient principles of each, e.g., Columbia Pictures is the ephemeral data/RAM case; Rambus is the Shred Day case; In re NTL is the right of control case; In re: Weekley Homes is the Texas case about accessing the other side’s hard drives, Wms v. Sprint is the spreadsheet metadata case (you get the idea).  I won’t test your memory of jurists, but it’s helpful-not-crucial to recall the authors of the decisions (especially when they spoke to our class like Judges Peck and Grimm). 

Case Review Hints:

  • Green v. Blitz: (Judge Ward, Texas) This case speaks to the need for competence in those responsible for preservation and collection and what constitutes a defensible eDiscovery strategy. What went wrong here? What should have been done differently?
  • In re: Weekly Homes: (Texas Supreme Court) This is one of the three most important Texas cases on ESI. You should understand the elements of proof which the Court imposes for access to an opponent’s storage devices and know terms of TRCP Rule 196.4, especially the key areas where the state and Federal ESI rules diverge.
  • Zubulake: (Judge Scheindlin, New York) The Zubulake series of decisions are seminal to the study of e-discovery in the U.S.  Zubulake remains the most cited of all EDD cases, so is still a potent weapon even after the Rules amendments codified much of its lessons. Know what the case is about, how the plaintiff persuaded the court that documents were missing and what the defendant did or didn’t do in failing to meet its discovery obligations. Know what an adverse inference instruction is and how it was applied in Zubulake versus what must be established under FRCP Rule 37€ after 2015. Know what Judge Scheindlin found to be a litigant’s and counsel’s duties with respect to preservation. Seven-point analytical frameworks (as for cost-shifting) make good test fodder.
  • Williams v. Sprint: (Judge Waxse, Kansas). Williams is a seminal decision respecting metadata. In Williams v. Sprint, the matter concerned purging of metadata and the locking of cells in spreadsheets in the context of an age discrimination action after a reduction-in-force. Judge Waxse applied Sedona Principle 12 in its earliest (and now twice revised) form. What should Sprint have done?  Did the Court sanction any party? Why or why not?
  • Rodman v. Safeway: (Judge Tigar, ND California) This case, like Zubulake IV, looks at the duties and responsibilities of counsel when monitoring a client’s search for and production of potentially responsive ESI? What is Rule 26(g), and what does it require? What constitutes a reasonable search? To what extent and under what circumstances may counsel rely upon a client’s actions and representations in preserving or collecting responsive ESI?
  • Columbia Pictures v. Bunnell: (Judge Chooljian, California) What prompted the Court to require the preservation of such fleeting, ephemeral information? Why were the defendants deemed to have control of the ephemeral data? Unique to its facts?
  • In re NTL, Inc. Securities Litigation: (Judge Peck, New York) Be prepared to discuss what constitutes control for purposes of imposing a duty to preserve and produce ESI in discovery and how it played out in this case. I want you to appreciate that, while a party may not be obliged to succeed in compelling the preservation or production of relevant information beyond its care, custody or control, a party is obliged to exercise all such control as the party actually possesses, whether as a matter of right or by course of dealing. What’s does The Sedona Conference think about that?
  • William A. Gross Constr. Assocs., Inc. v. Am. Mfrs. Mut. Ins. Co.: (Judge Peck, New York) What was the “wake up call,” who were expected to awaken and on what topics?
  • Adams v. Dell: (Judge Nuffer, Utah) What data was claimed to have been lost? What was supposed to have triggered the duty to preserve? What did the Court say about a responding party’s duty, particularly in designing its information systems? Outlier?
  • RAMBUS: (Judge Whyte, California) I expect you to know what happened and to appreciate that the mere reasonable anticipation of litigation–especially by the party who brings the action–triggers the common law duty to preserve. Be prepared to address the sorts of situations that might or might not trigger a duty to initiate a legal hold.
  • United States v. O’Keefe (Judge Facciola, DC): I like this case for its artful language (Where do angels fear to tread?) and consideration of the limits and challenges of keyword search.  The last being a topic that bears scrutiny wherever it has been addressed in the material.  That is, does keyword search work as well as lawyers think, and how can we improve upon it and compensate for its shortcomings? 
  • Victor Stanley v. Creative Pipe I & II (Judge Grimm, Maryland):  Read VS I with an eye toward understanding the circumstances when inadvertent production triggers waiver (pre-FRE 502).  What are the three standards applied to claims of waiver?  What needs to be in the record to secure relief?

    Don’t get caught up in the prolonged factual minutiae of VS II.  Read VS II to appreciate the varying standards that once existed across the Circuits for imposition of spoliation sanctions and that pre-date the latest FRCP Rules amendments, i.e., Rule 37(e).
  • Anderson Living Trust v. WPX Energy Production, LLC (Judge Browning, New Mexico): This case looks at the application and intricacies of FRCP Rule 34 when it comes to ESI versus documents.  My views about the case were set out in the article you read called “Breaking Badly.”
  • In re: State Farm Lloyds (Texas Supreme Court):  Proportionality is the buzzword here; but does the Court elevate proportionality to the point of being a costly hurdle serving to complicate a simple issue?  What does this case portend for Texas litigants in terms of new hoops to jump over issues as straightforward as forms of production?  What role did the Court’s confusion about forms (and a scanty record) play in the outcome?
  • Monique Da Silva Moore, et al. v. Publicis Groupe & MSL Group and Rio Tinto Plc v. Vale S.A., (Judge Peck, New York): DaSilva Moore is the first federal decision to approve the use of the form of Technology Assisted Review (TAR) called Predictive Coding as an alternative to linear, manual review of potentially responsive ESI.  Rio Tinto is Judge Peck’s follow up, re-affirming the viability of the technology without establishing an “approved” methodology.
  • Brookshire Bros. v. Aldridge (Texas Supreme Court): This case sets out the Texas law respecting spoliation of ESI…or does it?  Is the outcome and “analysis” here consistent with the other preservation and sanctions cases we’ve covered?
  • VanZant v. Pyle (Judge Sweet, New York): Issues of control and spoliation drive this decision.  Does the Court correctly apply Rule 37(e)?
  • CAT3 v. Black Lineage (Judge Francis, New York):This trademark infringement dispute concerned an apparently altered email.  Judge Francis found the alteration sufficient to support sanctions under Rule 37(e).  How did he get there?  Judge Francis also addressed the continuing viability of discretionary sanctions despite 37(e).  What did he say about that?
  • EPAC v. Thos. Nelson, Inc.: Read this report closely to appreciate how the amended Rules, case law and good practice serve to guide the court in fashioning remedial measures and punitive sanctions.  Consider the matter from the standpoint of the preservation obligation (triggers and measures) and from the standpoint of proportionate remedial measures and sanctions.  What did the Special Master do wrong here?
  • Mancia v. Mayflower (Judge Grimm, Maryland): Don’t overlook this little gem in terms of its emphasis on counsel’s duties under FRCP Rule 26(g).  What are those duties?  What do they signify for e-discovery? What is the role of cooperation in an adversarial system?
  • Race Tires America, Inc. v. Hoosier Racing Tire Corp. (Judge Vanaskie, Pennsylvania): This opinion cogently defines the language and limits of 28 U.S.C. §1920 as it relates to the assessment of e-discovery expenses as “taxable costs.”  What common e-discovery expenses might you seek to characterize as costs recoverable under §1920, and how would you make your case?
  • Zoch v. Daimler (Judge Mazzant, Texas):  Did the Court correctly resolve the cross-border and blocking statute issues?  Would the Court’s analysis withstand appellate scrutiny once post-GDPR? 

Remember: bits, bytes, sectors, clusters (allocated and unallocated), tracks, slack space, file systems and file tables, why deleted doesn’t mean gone, forensic imaging, forensic recovery techniques like file carving, EXIF data, geolocation, file headers/binary signatures, hashing, normalization, de-NISTing, deduplication and file shares.  For example: you should know that an old 3.5” floppy disk typically held no more than 1.44MB of data, whereas the capacity of a new hard drive or modern backup tape would be measured in terabytes. You should also know the relative capacities indicated by kilobytes, megabytes, gigabytes, terabytes and petabytes of data (i.e., their order of ascendancy, and the fact that each is 1,000 times more or less than the next or previous tier).  Naturally, I don’t expect you to know the tape chronology/capacities, ASCII/hex equivalencies or other ridiculous-to-remember stuff.

4. TERMINOLOGY: Lawyers, more than most, should appreciate the power of precise language.  When dealing with professionals in technical disciplines, it’s important to call things by their right name and recognize that terms of art in one context don’t necessarily mean the same thing in another.  When terms have been defined in the readings or lectures, I expect you to know what those terms mean.  For example, you should know what ESI, EDRM, RAID, system and application metadata (definitely get your arms firmly around application vs. system metadata), retention, purge and rotation mean (e.g., grandfather-father-son rotation); as well as Exchange, O365, 26(f), 502(d), normalization, recursion, native, near-native, TIFF+, load file, horizontal, global and vertical deduplication, IP addressing, data biopsy, forensically sound, productivity files, binary signatures and file carving, double deletion, load files, delimiters, slack space, unallocated clusters, UTC offset, proportionality, taxable costs, sampling, testing, iteration, TAR, predictive coding, recall, precision, UTC, VTL, SQL, etc.

5. ELECTRONIC DISCOVERY REFERENCE MODEL:  We’ve returned to the EDRM many times as we’ve moved from left to right across the iconic schematic.  Know it’s stages, their order and what those stages and triangles signify.

6. ENCODING: You should have a firm grasp of the concept of encoded information, appreciating that all digital data is stored as numbers notated as an unbroken sequence of 1s and 0s. How is that miracle possible? You should be comfortable with the concepts described in pp. 132-148 of the Workbook (and our class discussions of the fact that the various bases are just ways to express numbers of identical values in different notations). You should be old friends with the nature and purpose of, e.g., base 2 (binary), base 10 (decimal) base 16 (hexadecimal), base 64 (attachment encoding), ASCII and UNICODE.

7 STORAGE: You should have a working knowledge of the principal types and capacities of common electromagnetic and solid-state storage devices and media (because data volume has a direct relationship to cost of processing and time to review in e-discovery). You should be able to recognize and differentiate between, e.g., floppy disks, thumb drives, optical media, hard drives, solid state storage devices, RAID arrays and backup tape, including a general awareness of how much data they hold. Much of this is in pp. 22-48 of the Workbook (Introduction to Data Storage Media).  For ready reference and review, I’ve added an appendix to this study guide called, “Twenty-One Key Concepts for Electronically Stored Information.”

8. E-MAIL: E-mail remains the epicenter of corporate e-discovery; so, understanding e-mail systems, forms and the underlying structure of a message is important.  The e-mail chapter should be reviewed carefully.  I wouldn’t expect you to know file paths to messages or e-mail forensics, but the anatomy of an e-mail is something we’ve covered in detail through readings and exercises.  Likewise, the messaging protocols (POP, MAPI, IMAP, WEB, MIME, etc.), mail single message and container formats (PST, OST, EDB, NSF, EML, MSG, DBX, MHTML, MBOX) and leading enterprise mail client-server pairings (Exchange/Outlook, Domino/Notes, O365/browser) are worth remembering.  Don’t worry, you won’t be expected to extract epoch times from boundaries again. 😉

9. FORMS: Forms of production loom large in our curriculum.  Being that everything boils down to just an unbroken string of ones-and-zeroes, the native forms and the forms in which we elect to request and produce them (native, near-native, images (TIFF+ and PDF), paper) play a crucial role in all the “itys” of e-discovery: affordability, utility, intelligibility, searchability and authenticability.  What are the purposes and common structures of load files?  What are the pros and cons of the various forms of production?  Does one size fit all?  How does the selection of forms play out procedurally in federal and Texas state practice?  How do we deal with Bates numbering and redaction?  Is native and near-native production better and, if so, how do we argue the merits of native production to someone wedded to TIFF images?  This is HUGE in my book!  There WILL be at least one essay question on this and likely several other test questions.

10. SEARCH AND REVIEW: We spent a fair amount of time talking about and doing exercises on search and review.  You should understand the various established and emerging approaches to search: e.g., keyword search, Boolean search, fuzzy search, stemming, clustering, predictive coding and Technology Assisted Review (TAR).  Why is an iterative approach to search useful, and what difference does it make?  What are the roles of testing, sampling and cooperation in fashioning search protocols?  How do we measure the efficacy of search?  Hint: You should know how to calculate recall and precision and know the ‘splendid steps’ to take to improve the effectiveness and efficiency of keyword search (i.e., better F1 scores). 

You should know what a review tool does and customary features of a review platform.  You should know the high points of the Blair and Maron study (you read and heard about it multiple times, so you need not read the study itself).  Please also take care to understand the limitations on search highlighted in your readings and those termed The Streetlight Effect.

11.ACCESSIBILITY AND GOOD CAUSE: Understand the two-tiered analysis required by FRCP Rule 26(b)(2)(B).  When does the burden of proof shift, and what shifts it?  What tools (a/k/a conditions) are available to the Court to protect competing interests of the parties

12. FRE RULE 502: It’s your friend!  Learn it, love it, live it (or at least know when and how to use it).  What protection does it afford against subject matter waiver?  Is there anything like it in state practice?  Does it apply to all legally cognized privileges?

13. 2006 AND 2015 RULES AMENDMENTS: You should understand what they changed with respect to e-discovery.  Concentrate on proportionality and scope of discovery under Rule 26, along with standards for sanctions under new Rule 37(e).  What are the Rule 26 proportionality factors?  What are the findings required to obtain remedial action versus serious sanctions for spoliation of ESI under 37(e)?  Remember “intent to deprive.”

14. MULTIPLE CHOICE: When I craft multiple choice questions, there will typically be two answers you can quickly discard, then two you can’t distinguish without knowing the material. So, if you don’t know an answer, you increase your odds of doing well by eliminating the clunkers and guessing. I don’t deduct for wrong answers.  Read carefully to not whether the question seeks the exception or the rule. READ ALL ANSWERS before selecting the best one(s) as I often include an “all of the above” or “none of the above” option.

15. All lectures and reviews of exercises are recorded and online for your review, if desired.

16. In past exams, I used the following essay questions.  These will not be essay questions on your final exam; however, I furnish them here as examples of the scope and nature of prior essay questions:

EXAMPLE QUESTION A: On behalf of a class of homeowners, you sue a large bank for alleged misconduct in connection with mortgage lending and foreclosures. You and the bank’s counsel agree upon a set of twenty Boolean and proximity queries including:

  • fnma AND deed-in-lieu
  • 1/1/2009 W/4 foreclos!
  • Resumé AND loan officer
  • LTV AND NOT ARM
  • (Problem W/2 years) AND HARP

These are to be run against an index of ten loan officers’ e-mail (with attached spreadsheets, scanned loan applications, faxed appraisals and common productivity files) comprising approximately 540,000 messages and attachments).  Considering the index search problems discussed in class and in your reading called “The Streetlight Effect in E-Discovery,” identify at least three capabilities or limitations of the index and search tool that should be determined to gauge the likely effectiveness of the contemplated searches.  Be sure to explain why each matter. 

I am not asking you to assess or amend the agreed-upon queries.  I am asking what needs to be known about the index and search tool to ascertain if the queries will work as expected.

EXAMPLE QUESTION B: The article, A Bill of Rights for E-Discovery included the following passage:

I am a requesting party in discovery.

I have duties.

I am obliged to: …

Work cooperatively with the producing party to identify reasonable and effective means to reduce the cost and burden of discovery, including, as appropriate, the use of tiering, sampling, testing and iterative techniques, along with alternatives to manual review and keyword search.

Describe how “tiering, sampling, testing and iterative techniques, along with alternatives to manual review and keyword search” serve to reduce the cost and burden of e-discovery.  Be sure to make clear what each term means.

It’s been an excellent semester and a pleasure for me to have had the chance to work with a bright bunch.  Thank you for your effort!  I’ve greatly enjoyed getting to know you notwithstanding the limits imposed by the pandemic and Mother Nature’s icy wrath.  I wish you the absolute best on the exam and in your splendid careers to come.  Count me as a future resource to call on if I can be of help to you.  Best of Luck!   Craig Ball

APPENDIX

Twenty-One Key Concepts for Electronically Stored Information

  1. Common law imposes a duty to preserve potentially relevant information in anticipation of litigation.
  2. Most information is electronically stored information (ESI).
  3. Understanding ESI entails knowledge of information storage media, encodings and formats.
  4. There are many types of e-storage media of differing capacities, form factors and formats:
    a) analog (phonograph record) or digital (hard drive, thumb drive, optical media).
    b) mechanical (electromagnetic hard drive, tape, etc.) or solid-state (thumb drive, SIM card, etc.).
  5. Computers don’t store “text,” “documents,” “pictures,” “sounds.” They only store bits (ones or zeroes).
  6. Digital information is encoded as numbers by applying various encoding schemes:
    a) ASCII or Unicode for alphanumeric characters.
    b) JPG for photos, DOCX for Word files, MP3 for sound files, etc.
  7. We express these numbers in a base or radix (base 2 binary, 10 decimal, 16 hexadecimal, 60 sexagesimal). E-mail messages encode attachments in base 64.
  8. The bigger the base, the smaller the space required to notate and convey the information.
  9. Digitally encoded information is stored (written):
    a) physically as bytes (8-bit blocks) in sectors and partitions.
    b) logically as clusters, files, folders and volumes.
  10. Files use binary header signatures to identify file formats (type and structure) of data.
  11. Operating systems use file systems to group information as files and manage filenames and metadata.
  12. Windows file systems employ filename extensions (e.g., .txt, .jpg, .exe) to flag formats.
  13. All ESI includes a component of metadata (data about data) even if no more than needed to locate it.
  14. A file’s metadata may be greater in volume or utility than the contents of the file it describes.
  15. File tables hold system metadata about the file (e.g., name, locations on disk, MAC dates): it’s CONTEXT.
  16. Files hold application metadata (e.g., EXIF geolocation data in photos, comments in docs): it’s CONTENT.
  17. File systems allocate clusters for file storage, deleting files releases cluster allocations for reuse.
  18. If unallocated clusters aren’t reused, deleted files may be recovered (“carved”) via computer forensics.
  19. Forensic (“bitstream”) imaging is a method to preserve both allocated and unallocated clusters.
  20. Data are numbers, so data can be digitally “fingerprinted” using one-way hash algorithms (MD5, SHA1).
  21. Hashing facilitates identification, deduplication and de-NISTing of ESI in e-discovery.