• Home
  • About
  • CRAIGBALL.COM
  • Disclaimer
  • Log In

Ball in your Court

~ Musings on e-discovery & forensics.

Ball in your Court

Category Archives: E-Discovery

Detecting Deep Fakes

24 Tuesday Feb 2026

Posted by craigball in ai, Computer Forensics, E-Discovery, General Technology Posts, Law Practice & Procedure

≈ 2 Comments

This morning, I was approached to present in Texas on deep fake evidence and what litigators need to know to confront it.  It’s to be called, “Real or Rigged: How to Know Whether Evidence Is Fake.” I realized, to my chagrin, that I didn’t have a paper I could hand out—no single place where I had pulled together the technical realities, evidentiary doctrine, and practical litigation tactics this subject demands. So, I wrote one. Whether I ultimately give the talk remains to be seen, but I’m hopeful the resulting article will prove useful to you. The paper—Forensic Tells: A Practitioner’s Guide to Detecting Deep Fakes and Authenticating Digital Evidence—runs about thirty pages and is available here.

The piece starts from a simple premise: digital evidence does not fall like manna from heaven; it has a provenance that speaks to its authenticity. It is fundamentally different from paper because it carries a payload of information about its origins and handling—metadata that functions as a chain of custody embedded within the file itself. In an era when AI systems can generate convincing photographs, videos, and audio recordings of events that never occurred, that metadata has become the last line of defense against manufactured reality.

While I regard myself as much more a student of AI than an authority, I’ve been writing about metadata and evidence as long as anyone on two legs; so, I hope I bring something of value to the topic.  You be the judge.  The article explains, in practical terms, how synthetic media is created, why fabricated media often lacks the coherent metadata of authentic recordings, and how lawyers can use that disparity to authenticate—or challenge—digital evidence. It also addresses the emerging “liar’s dividend,” the phenomenon whereby wrongdoers dismiss authentic recordings as fake simply because the technology exists to fabricate them.

More importantly, the article is written as a practitioner’s guide, not a technical treatise. It outlines concrete discovery strategies: demanding native files, targeting interrogatories and requests for admission, pursuing third-party records, and, where necessary, seeking forensic examination of source devices. It explains what to look for in metadata, what visual and auditory artifacts may signal manipulation, and how federal and Texas evidence rules—including Rules 901 and 902—apply to synthetic media challenges. It closes with a practical checklist and discussion of emerging provenance technologies that may someday make authentication easier—but, for now, make it more essential that lawyers understand how to ask the right questions.

Your feedback is always welcome and appreciated.

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Print (Opens in new window) Print
  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...

A Master Table of Truth

04 Tuesday Nov 2025

Posted by craigball in ai, Computer Forensics, E-Discovery, General Technology Posts, Law Practice & Procedure, Uncategorized

≈ 5 Comments

Tags

ai, artificial-intelligence, chatgpt, eDiscovery, generative-ai, law, technology

Lawyers using AI keep turning up in the news for all the wrong reasons—usually because they filed a brief brimming with cases that don’t exist. The machines didn’t mean to lie. They just did what they’re built to do: write convincingly, not truthfully.

When you ask a large language model (LLM) for cases, it doesn’t search a trustworthy database. It invents one. The result looks fine until a human judge, an opponent or an intern with Westlaw access, checks. That’s when fantasy law meets federal fact.

We call these fictions “hallucinations,” which is a polite way of saying “making shit up;” and though lawyers are duty-bound to catch them before they reach the docket, some don’t. The combination of an approaching deadline and a confident-sounding computer is a dangerous mix.

Perhaps a Useful Guardrail

It struck me recently that the legal profession could borrow a page from the digital forensics world, where we maintain something called the NIST National Software Reference Library (NIST NSRL). The NSRL is a public database of hash values for known software files. When a forensic examiner analyzes a drive, the NSRL helps them skip over familiar system files—Windows dlls and friends—so they can focus on what’s unique or suspicious.

So here’s a thought: what if we had a master table of genuine case citations—a kind of NSRL for case citations?

Picture a big, continually updated, publicly accessible table listing every bona fide reported decision: the case name, reporter, volume, page, court, and year. When your LLM produces Smith v. Jones, 123 F.3d 456 (9th Cir. 2005), your drafting software checks that citation against the table.

If it’s there, fine—it’s probably references a genuine reported case.
If it’s not, flag it for immediate scrutiny.

Think of it as a checksum for truth. A simple way to catch the most common and indefensible kind of AI mischief before it becomes Exhibit A at a disciplinary hearing.

The Obstacles (and There Are Some)

Of course, every neat idea turns messy the moment you try to build it.

Coverage is the first challenge. There are millions of decisions, with new ones arriving daily. Some are published, some are “unpublished” but still precedential, and some live only in online databases. Even if we limited the scope to federal and state appellate courts, keeping the table comprehensive and current would be an unending job; but not an insurmountable obstacle.

Then there’s variation. Lawyers can’t agree on how to cite the same case twice. The same opinion might appear in multiple reporters, each with its own abbreviation. A master table would have to normalize all of that—an ambitious act of citation herding.

And parsing is no small matter. AI tools are notoriously careless about punctuation. A missing comma or swapped parenthesis can turn a real case into a false negative. Conversely, a hallucinated citation that happens to fit a valid pattern could fool the filter, which is why it’s not the sole filter.

Lastly, governance. Who would maintain the thing? Westlaw and Lexis maintain comprehensive citation data, but guard it like Fort Knox. Open projects such as the Caselaw Access Project and the Free Law Project’s CourtListener come close, but they’re not quite designed for this kind of validation task. To make it work, we’d need institutional commitment—perhaps from NIST, the Library of Congress, or a consortium of law libraries—to set standards and keep it alive.

Why Bother?

Because LLMs aren’t going away. Lawyers will keep using them, openly or in secret. The question isn’t whether we’ll use them—it’s how safely and responsibly we can do so.

A public master table of citations could serve as a quiet safeguard in every AI-assisted drafting environment. The AI could automatically check every citation against that canonical list. It wouldn’t guarantee correctness, but it would dramatically reduce the risk of citing fiction. Not coincidentally, it would have prevented most of the public excoriation of careless counsel we’ve seen.

Even a limited version—a federal table, or one covering each state’s highest court—would be progress. Universities, courts, and vendors could all contribute. Every small improvement to verifiability helps keep the profession credible in an era of AI slop, sloppiness and deep fakes.

No Magic Bullet, but a Sensible Shield

Let’s be clear: a master table won’t prevent all hallucinations. A model could still misstate what a case holds, or cite a genuine decision for the wrong proposition. But it would at least help keep the completely fabricated ones from slipping through unchecked.

In forensics, we accept imperfect tools because they narrow uncertainty. This could do the same for AI-drafted legal writing—a simple checksum for reality in a profession that can’t afford to lose touch with it.

If we can build databases to flag counterfeit currency and pirated software, surely we can build one to spot counterfeit law?

Until that day, let’s agree on one ironclad proposition: if you didn’t verify it, don’t file it.

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Print (Opens in new window) Print
  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...

Kaylee Walstad, 1962-2025

19 Tuesday Aug 2025

Posted by craigball in E-Discovery, Personal

≈ 28 Comments

Tags

family, life, love

Writing through tears, I am heartbroken to share that Kaylee Walstad has died suddenly and unexpectedly.

Kaylee was the loving, nurturing mom of our e-discovery community; our tireless cheerleader, stalwart friend, and steady heart. She showed up for everyone—eager to listen, to soothe, to lift burdens from others’ shoulders. She was generosity and kindness incarnate. Wise and warm, radiant and real, she was simply one of a kind.

For years, I’ve begun each day with Kaylee and her EDRM partner and compadre, Mary Mack. Weekdays, weekends, holidays—every morning began with Wordle and a few encouraging words from Kaylee. That small ritual became my daily “proof of life.” In the truest sense, the sun rose with Kaylee Walstad’s light.

Every Tuesday for five years, she was there for the EDRM community support call. And every time, despite her own challenges, Kaylee devoted herself to lifting the spirits of others. She cared, genuinely and deeply, radiating love the way a flame radiates heat. If you knew Kaylee, you know exactly what I mean. If you didn’t, I am sorry—because to know her was to feel lighter, better, more hopeful. She was “Minnesota Nice” to the bone.

Beyond our community, Kaylee was devoted to her two children and her sister. Weekends and holidays were joyous festivals of food, laughter, and family. She poured herself into them, and their triumphs were hers. I cannot begin to fathom the depth of their loss.

We will honor Kaylee’s professional achievements in due time, but right now my heart insists on pouring out love and admiration for the glorious woman who has left us so abruptly, and left us all immeasurably better for having known her.

In the words of poet Thomas Campbell: “To live in hearts we leave behind is not to die.” Kaylee lives on in the hearts of all she lifted, encouraged, and loved.

Gregory Bufithis, one of Kaylee’s legions of admirers, shared a version of these comforting words:

Do not stand by my grave, and weep.
I am not there, I do not sleep.
I am the thousand winds that blow
I am the diamond glints in snow
I am the sunlight on ripened grain,
I am the gentle, autumn rain.
As you awake with morning’s hush,
I am the swift, up-flinging rush
Of quiet birds in circling flight,
I am the day transcending night.

Do not stand by my grave, and cry—
I am not there, I did not die.

— Clare Harner, Topeka, Kansas, December 1934

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Print (Opens in new window) Print
  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...

Native or Not? Rethinking Public E-Mail Corpora for E-Discovery (Redux, 2013→2025)

16 Saturday Aug 2025

Posted by craigball in ai, Computer Forensics, E-Discovery, Uncategorized

≈ 2 Comments

Tags

ai, artificial-intelligence, chatgpt, eDiscovery, EDRM, generative-ai, Linked attachments, Purview, technology

Yesterday, I found myself in a spirited exchange with a colleague about whether the e-discovery community has suitable replacements for the Enron e-mail corpora1—now more than two decades old—as a “sandbox” for testing tools and training students. I argued that the quality of the data matters: native or near-native e-mail collections remain essential to test processing and review workflows in ways that mirror real-world litigation.

The back-and-forth reminded me that, unlike forensic examiners or service providers, ediscovery lawyers may not know or care much about the nature of electronically-stored information until it finds its way to a review tool. I get that. If your interest in email is in testing AI coding tools, you’re laser-focused on text and maybe a handful of metadata; but if your focus is on the integrity and authenticity of evidence, or in perfecting processing tools, the originating native or near-native form of the corpus matters more.

What follows is a re-publication of a post from July 2013. I’m bringing it back because the debate over forms of email hasn’t gone away; the issue is as persistent and important as ever. A central takeaway bears repeating: the litmus test is whether a corpus hews to a fulsome RFC-5322 compliant format. If headers, MIME boundaries, and transport artifacts are stripped or incompletely synthesized, what remains ceases to be a faithful native or near-native format. That distinction matters, because even experienced e-discovery practitioners—those fixated on review at the far-right side of the EDRM—may not fully appreciate what an RFC-5322 email is, or how much fidelity is lost when working with post-processed sets.

Continue reading →

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Print (Opens in new window) Print
  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...

Still on Dial-Up: Why It’s Time to Retire the Enron Email Corpus

15 Friday Aug 2025

Posted by craigball in Computer Forensics, E-Discovery, General Technology Posts

≈ 11 Comments

Tags

corpora, E-Discovery, eDiscovery, Enron, ESI, forensics

Early this century, when I was gaining a reputation as a trial lawyer who understood e-discovery and digital forensics, I was hired to work as the lead computer forensic examiner for plaintiffs in a headline-making case involving a Houston-based company called Enron.  It was a heady experience.

Today, everywhere you turn in e-discovery, Enron is still with us. Not the company that went down in flames more than two decades ago, but the Enron Email Corpus, the industry’s default demo dataset.

Type in “Ken Lay” or “Andy Fastow,” hit search, and watch the results roll in. For vendors, it’s the easy choice: free, legal, and familiar. But for 2025, it’s also frozen in time—benchmarking the future of discovery against the technological equivalent of a rotary phone. Or, now that AOL has lately retired its dial-up service, benchmarking it against a 56K modem.

How Enron Became Everyone’s Test Data

When Enron collapsed in 2001 amid accounting fraud and market-manipulation scandals, the U.S. Federal Energy Regulatory Commission (FERC) launched a sweeping investigation into abuses during the Western U.S. energy crisis. As part of that probe, FERC collected huge volumes of internal Enron email.

In 2003, in an extraordinary act of transparency, FERC made a subset of those emails public as part of its docket. Some messages were removed at employees’ request; all attachments were stripped.

The dataset got a second life when Carnegie Mellon University’s School of Computer Science downloaded the FERC release, cleaned and structured it into individual mailboxes, and published it for research. That CMU version contains roughly half a million messages from about 150 Enron employees.

A few years later, the Electronic Discovery Reference Model (EDRM)—where I serve as General Counsel—stepped in to make the corpus more accessible to the legal tech world. EDRM curated, repackaged, and hosted improved versions, including PST-structured mailboxes and more comprehensive metadata. Even after CMU stopped hosting it, EDRM kept it available for years, ensuring that anyone building or testing e-discovery tools had a free, legal dataset to use. [Note: EDRM no longer hosts the Enron corpus, but for those who like hunting antiques, you may find it (or parts of it) at CMU, Enrondata.org, Kaggle.com and, no joke, The Library of Congress].

Because it’s there, lawful, and easy, Enron became—and regrettably remains—the de facto benchmark in our industry.

Why Enron Endures

Its virtues are obvious:

  • Free and lawful to use
  • Large enough to exercise search and analytics tools
  • Real corporate communications with all their messy quirks
  • Familiar to the point of being an industry standard

But those virtues are also the trap. The data is from 2001—before smartphones, Teams, Slack, Zoom, linked attachments, and nearly every other element that makes modern email review challenging.

In 2025, running Enron through a discovery platform is like driving a Formula One race car on cobblestone streets.

Continue reading →

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Print (Opens in new window) Print
  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...

Tailor FRE 502(d) Orders to the Case

20 Monday Jan 2025

Posted by craigball in E-Discovery, Law Practice & Procedure

≈ 6 Comments

Tags

ethics, insurance, law, legal, news

Having taught Federal Rule of Evidence 502 (FRE 502) in my law classes for over a decade, I felt I had a firm grasp of its nuances. Yet recent litigation where I serve as Special Master prompted me to revisit the rule with Proustian ‘fresh eyes,’ uncovering insights I hope to share here

I’ve long run with the herd in urging lawyers to “always get a 502 order,” never underscoring important safeguards against unintended outcomes; but lately, I had the opportunity to hear from experienced trial counsel on both sides of a FRE 502 order negotiation and have gained a more nuanced view.

Enacted in 2008, FRE 502 was a means to use the federal rules (and Congress’ adoption of the same) to harmonize widely divergent outcomes vis-à-vis subject matter waiver flowing from the inadvertent disclosure of privileged information. 

That’s a mouthful, and I know many readers aren’t litigators, so let’s lay a little foundation.

Confidential communications shared in the context of special relationships are largely shielded from compulsory disclosure by what is termed “privilege.”  You certainly know of the Fifth Amendment privilege against self-incrimination, and no doubt you’ve heard (if only in crime dramas) that confidential communications between a lawyer and client for the purpose of securing legal advice are privileged.  That’s the “attorney-client privilege.” Other privileges extend to, inter alia, spousal communications, confidences shared between doctor and patient and confidences between clergy and parishioner for spiritual guidance.  None of these privileges are absolute, but that’s a topic for another day. 

Yet another privilege, called “work-product protection,” shields from disclosure an attorney’s mental impressions, conclusions, opinions, or legal theories contained in materials prepared in anticipation of litigation or for trial.  Here, we need only consider the attorney-client privilege and work-product protection because FRE 502 applies exclusively to those two privileges.

Clearly, lawyers enjoy extraordinary and expansive rights to withhold privileged information, and lawyers really, REALLY hate to mess up in ways that impair those rights. I’d venture that as much effort and money is expended seeking to guard against the disclosure of privileged material as is spent trying to isolate relevant evidence. A whole lot, at any rate.

One of the quickest ways to lose a privilege is by sharing the privileged material with someone who isn’t entitled to claim the privilege.  Did the lawyer let the friend who drove the client to the law office sit in when confidences were exchanged?  Such actions waive the privilege.  One way to lose a privilege is by accidentally letting an opponent get a look at privileged material.  That can happen in a host of prosaic ways, even just by the wrong CC on an email.   More often, it’s a consequence of a failed e-discovery process, say, a reviewer or production error.  Inadvertently producing privileged information in discovery is every litigator’s nightmare.  It happens often enough that the various states and federal circuits developed different ways of balancing protection from waiver against findings that the waiver opened the door to further disclosure in a disaster scenario called “Subject Matter Waiver.”

Continue reading →

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Print (Opens in new window) Print
  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...

Leery Lawyer’s Guide to AI

09 Thursday Jan 2025

Posted by craigball in E-Discovery, General Technology Posts

≈ 10 Comments

Tags

ai, artificial-intelligence, chatgpt, eDiscovery, generative-ai, openai, technology

Next month, I’m privileged to be presenting on two topics with United States District Judge Xavier Rodriguez, a dear friend who sits in the Western District of Texas (San Antonio). One of those topics is “Practical Applications for AI.” The longstanding custom for continuing legal education in Texas is that a presenter must offer “high quality written materials” to go with a talk. I’m indebted to this obligation because writing is hard work and without the need to supply original scholarship, I’d probably have produced a fraction of what I’ve published over forty years. A new topic meant a new paper, especially as I was the proponent of the topic in the planning stage–an ask borne of frustration. After two years of AI pushing everything else aside, I was frustrated by the dearth of practical guidance available to trial lawyers–particularly seasoned elders–who want to use AI but fear looking foolish…or worse. So, I took a shot at a practical primer for litigators and am reasonably pleased with the result. Download it here. For some it will be too advanced and for others too basic; but I’m hopeful it hits the sweet spot for many non-technical trial lawyers who don’t want to be left behind.

Despite high-profile instances of lawyers getting into trouble by failing to use LLMs responsibly, there’s a compelling case for using AI in your trial practice now, even if only as a timesaver in document generation and summarization—tasks where AI’s abilities are uncanny and undeniable. But HOW to get started?

The Litigation Section of the State Bar of Texas devoted the Winter 2024 issue of The Advocate magazine to Artificial Intelligence.  Every article was well-written and well-informed—several penned by close friends—but no article, not one, was practical in the sense of helping lawyers use AI in their work. That struck me as an unmet need.

As I looked around, I found no articles geared to guiding trial lawyers who want to use LLMs safely and strategically. I wanted to call the article “The Leery Lawyer’s Guide to AI,” but I knew it would be insufficiently comprehensive. Instead, I’ve sought to help readers get started by highlighting important considerations and illustrating a few applications that they can try now with minimal skill, anxiety or expense. LLMs won’t replace professional judgment, but they can frame issues, suggest language, and break down complex doctrines into plain English explanations. In truth, they can do just about anything that a mastery of facts and language can achieve.

But Know This…

LLMs are unlike any tech tool you’ve used before. Most of the digital technology in our lives is characterized by consistency: you put the same things in, and other things come out in a rigid and replicable fashion. Not so with LLMs. Ask ChatGPT the same question multiple times, and you’ll get a somewhat different answer each time. That takes getting used to. 

Additionally, there’s no single “right” way to interrogate ChatGPT to be assured of an optimal result. That is, there is no strict programming language or set of keywords calculated to achieve a goal. There are a myriad number of ways to successfully elicit information from ChatGPT, and in stark contrast to the inflexible and unforgiving tech tools of the past, the easiest way to get the results you want is to interact with ChatGPT in a natural, conversational fashion.

Continue reading →

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Print (Opens in new window) Print
  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...

Safety First: A Fun Day at the “Office”

16 Monday Dec 2024

Posted by craigball in Computer Forensics, E-Discovery, General Technology Posts, Personal

≈ 4 Comments

Tags

bosiet, caebs, drill-ship, forensics, offshore, vdr, voyage-data-recorder

As a forensic examiner, I’ve gathered data in locales ranging from vast, freezing data centers to the world’s largest classic car collection. Yet, wherever work has taken me, I’ve not needed special equipment or certifications beyond my forensic skills and tools.  That is, until I was engaged to inspect and acquire a Voyage Data Recorder aboard a drilling vessel operating in the Gulf of Mexico.

A Voyage Data Recorder (VDR) is the marine counterpart of the Black Box event recorder in an airliner.  It’s a computer like any other, but hardened and specialized.  Components are designed to survive a catastrophic event and tell the story of what transpired.

Going offshore by helicopter to a rig or vessel demands more than a willingness to go.  The vessel operator required that I have a BOSIET with CAEBS certification to come aboard.  That stands for Basic Offshore Safety Induction Emergency Training with Compressed Air Emergency Breathing System.  It’s sixteen hours of training, half online and half onsite and hands on.  I suppose I was expected to balk, but I completed the course in Houston on Thursday.  Now, I’m the only BOSIET with CAEBS-certified lawyer forensic examiner I know (for all the good that’s likely to do me beyond this one engagement).  Still, it was a blast to train in a different discipline.

A BOSIET with CAEBS certification encompasses four units:

  1. Safety Induction
  2. Helicopter Safety and Escape Training (with CA-EBS) using a Modular Egress Training Simulator (METS)
  3. Sea Survival including Evacuation, TEMSPC, and Emergency First Aid
  4. Firefighting and Self Rescue Techniques
Continue reading →

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Print (Opens in new window) Print
  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...

“There’s No Better Rule”

28 Wednesday Aug 2024

Posted by craigball in E-Discovery

≈ 9 Comments

“Take nothing on its looks; take everything on evidence. There’s no better rule.”  I quote this line from Charles Dickens’ Great Expectations at the end of all my emails.  It’s my guiding light.  Sure, how things look matters, but how things truly are matters more.  At least it should be that way. 

I reflect on all this while listening to a webinar presented by my friends, Doug Austin and Kelly Twigger, and moderated by Brett Burney.  They discussed so-called Modern Attachments or, as they prefer to call them, Hyperlinked Files.  In a nutshell, Modern Attachments (as Microsoft calls them) are files that are stored in the Cloud and accessed by links transmitted within an email as distinguished from documents embedded within the transmitting e-mail and thus traveling with the e-mail rather than being retrieved by the recipient clicking on a hyperlink.  The debate about the extent of the duty to preserve, collect and produce these modern attachments rages on, and I don’t post here to rehash that back-and-forth.  My purpose is to tackle some misinformation advanced as a basis to exclude modern attachments from the reach of discovery.

Many who paint dealing with Modern Attachments as infeasible or fraught with risk posit that Modern Attachments tend to be collaborative documents or documents that have gone through edits after transmittal.  They argue that they shouldn’t have to produce Modern Attachments due to uncertainty over whether the document collected during discovery differs significantly from how it existed at the time of transmittal. I don’t think that a good argument against collection and review, but once more, not my point here.

My point is that we need to stop asserting that these Modern Attachments are routinely altered after transmittal without evidence of the incidence of alteration.  We should never guess at what we can readily measure.

Based on my experience, most modern attachments (e.g., 85-95%) are not altered after transmittal. Nevertheless, my personal observations mean little in the face of solid data revealing the percentage of Modern Attachments altered after transmittal.  We can measure this.  The last modified dates of Modern Attachments can be compared to their transmittal dates, either en masse or through appropriate sampling.  This will allow us to know the incidence of post-transmittal alteration based on hard evidence rather than assumptions or intuition.  I expect the incidence will vary between disciplines and corporate cultures, but that, too, is worth measuring.

Why hasn’t this been done?  A suspicious mind would conclude that those holding the data–who also happen to be the ones resisting the obligation to produce Modern Attachments–don’t want to know the metrics. Less archly, maybe they simply haven’t taken the time to measure, as guessing is easier. As they say in The Man Who Shot Liberty Valance, “This is the West, sir. When the legend becomes fact, print the legend.”  Facts are inconvenient. They’re sticking with the legend.

But it’s time to quit that.  It’s time to take everything on evidence.

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Print (Opens in new window) Print
  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...

Doveryai, No Proveryai!

07 Wednesday Aug 2024

Posted by craigball in Computer Forensics, E-Discovery, General Technology Posts

≈ 4 Comments

I recently published an AI prompt to run against search terms then get the AI to propose improvements.  Among the pitfalls I’d hoped to expose was the presence of “stop” or “noise” words; terms routinely excluded from search indices.  Searches incorporating stop words fail because terms not in the index won’t be found.  Ensuring your searches don’t include stop words is an essential step in framing effective queries.

To help the AI recognize stop words, the prompt included a list of default stop words for well-known eDiscovery tools.  That is, I thought I’d done that, but what I included in error (and have now replaced) was ChatGPT’s rendition of stop words for the major tools.  I’d made a mental note to check the lists supplied but—DOH!—I plugged it into the prompt and then forgot to do my due diligence.

I was feeling pretty good about the post and getting some nice feedback.  Last night, my dear friend and e-discovery Empress Mary Mack commented on the novelty of seeing the various stop word lists broken out in a ready reference.  I think echoes of Mary’s kind comment woke me at 4:00am, my subconscious screaming, “HEY DUMMY!  Did you verify those stop words?  Tell me you didn’t blindly trust an AI?!?”

So, long before sunrise, I was manually checking each stop word list against product websites and—lo and behold—every list was off: some merely incomplete but others not even close. ChatGPT hallucinated the lists, and I failed to do the crucial thing lawyers must do when using AI as a research assistant: Trust but verify.

No harm done, but I share my chagrin here to underscore that you just cannot trust an AI generative large language model to do your research without careful human assessment of the output.  I know this and let it slip my mind.  Last time for that.  I’ve corrected the prompt on my blog and hope I’ve gotten it right.  I post this to remind my readers that AI LLMs are great—USE THEM–but they are no substitute for you.  Doveryai, no proveryai!

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Print (Opens in new window) Print
  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...
← Older posts
Follow Ball in your Court on WordPress.com

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 2,238 other subscribers

Recent Posts

  • Detecting Deep Fakes February 24, 2026
  • A Fun Way to Build AI Fluency February 21, 2026
  • Electronic Evidence Workbook 2026 February 18, 2026
  • The Most Important Thing I’ve Read This Year February 12, 2026
  • 2026 Guide to AI and LLMs in Trial Practice January 9, 2026

Archives

RSS Feed RSS - Posts

CRAIGBALL.COM

Helping lawyers master technology

Categories

EDD Blogroll

  • Illuminating eDiscovery (Lighthouse)
  • E-Discovery Law Alert (Gibbons)
  • Sedona Conference
  • eDiscovery Journal (Greg Buckles)
  • Complex Discovery (Rob Robinson)
  • Minerva 26 (Kelly Twigger)
  • Corporate E-Discovery Blog (Zapproved )
  • E-D Team (Ralph Losey)
  • CS DISCO Blog
  • eDiscovery Today (Doug Austin)
  • GLTC (Tom O'Connor)
  • Basics of E-Discovery (Exterro)
  • The Relativity Blog

Admin

  • Create account
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com

Enter your email address to follow Ball in Your Court and receive notifications of new posts by email.

Website Powered by WordPress.com.

  • Subscribe Subscribed
    • Ball in your Court
    • Join 2,090 other subscribers
    • Already have a WordPress.com account? Log in now.
    • Ball in your Court
    • Subscribe Subscribed
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...
 

    %d