What’s All the Fuss About Linked Attachments?

29 Friday Mar 2024

Tags

ESI Protocols, hyperlinked files, Linked attachments, Purview

In the E-Discovery Bubble, we’re embroiled in a debate over “Linked Attachments.” Or should we say “Cloud Attachments,” or “Modern Attachments” or “Hyperlinked Files?” The name game aside, a linked or Cloud attachment is a file that, instead of being tucked into an email, gets uploaded to the cloud, leaving a trail in the form of a link shared in the transmitting message. It’s the digital equivalent of saying, “It’s in an Amazon locker; here’s the code” versus handing over a package directly. An “embedded attachment” travels within the email, while a “linked attachment” sits in the cloud, awaiting retrieval using the link.

Some recoil at calling these digital parcels “attachments” at all. I stick with the term because it captures the essence of the sender’s intent to pass along a file, accessible only to those with the key to retrieve it, versus merely linking to a public webpage. A file I seek to put in the hands of another via email is an “attachment,” even if it’s not an “embedment.” Oh, and Microsoft calls them “Cloud Attachments,” which is good enough for me.

Regardless of what we call them, they’re pivotal in discovery. If you’re on the requesting side, prepare for a revelation. And if you’re a producing party, the party’s over.

A Quick March Through History

Nascent email conveyed basic ASCII text but no attachments. In the early 90s, the advent of Multipurpose Internet Mail Extensions (MIME) enabled files to hitch a ride on emails via ASCII encoded in Base64. This tech pivot meant attachments could join emails as encoded stowaways, to be unveiled upon receipt.

For two decades, this embedding magic meant capturing an email also netted its attachments. But come the early 2010s, the cloud era beckoned. Files too bulky for email began diverting to cloud storage with emails containing only links or “pointers” to these linked attachments.

The Crux of the Matter

Linked attachments aren’t newcomers; they’ve been lurking for over a decade. Yet, there’s a growing “aha” moment among requesters as they realize the promised exchange of digital parcels hasn’t been as expected. Increasingly—and despite contrary representations by producing parties—relevant, responsive and non-privileged attachments to email aren’t being produced because relevant, responsive and non-privileged attachments aren’t being searched.

Wait! What? Say that again.

You heard me. As attachments shifted from being embedded to being linked, producing parties simply stopped collecting and searching those attachments.

How is that possible? Why didn’t they disclose that?

I’ll explain if you’ll indulge me in another history lesson.

Echoes From the Past

Traditionally, discovery leaned on indexing the content of email and attachments for quicker search, bypassing the need to sift through each individually. Every service provider employs indexed search.

When attachments are embedded in messages, those attachments are collected with the messages, then indexed and searched. But when those attachments are linked instead of embedded, collecting them requires an added step of downloading the linked attachments with the transmitting message. You must do this before you index and search because, if you fail to do so, the linked attachments aren’t searched or tied to the transmitting message in a so-called “family relationship.”

They aren’t searched. Not because they are immaterial or irrelevant or in any absolute sense, inaccessible; a linked attachment is as amenable to being indexed and searched as any other document. They aren’t searched because they aren’t collected; and they aren’t collected because it’s easier to blow off linked attachments than collect them.

Linked attachments, squarely under the producer’s control, pose a quandary. A link in an email is a dead-end for anyone but the sender and recipients and reveals nothing of the file’s content. These linked attachments could be brimming with relevant keywords yet remain unexplored if not collected with their emails.

So, over the course of the last decade, how many times has an opponent revealed that, despite a commitment to search a custodian’s email, they were not going to collect and search linked documents?

The curse and blessing of long experience is having seen it all before. Every generation imagines they invented sex, drugs and rock-n-roll, and every new information and communication technology is followed by what I call the “getting-away-with-murder” phase in civil discovery. Litigants claim that whatever new tech has wrought is “too hard” to deal with in discovery, and they get away with murder by not having to produce the new stuff until long after we have the means and methods to do so. I lived through that with e-mail, native production, then mobile devices, web content and now, linked attachments.

This isn’t just about technology but transparency and diligence in discovery. The reluctance to tackle linked attachments under claims of undue burden echoes past reluctances with emerging technologies. Yet, linked attachments, integral to relevance assessments, shouldn’t be sidelined.

What is the Burden, Really?

We see conclusory assertions of burden notwithstanding that the biggest platforms like Microsoft and Google offer ‘pretty good’ mechanisms to deal with linked attachments. So, if a producing party claims burden, it behooves the Court and requesting parties to inquire into the source of the messaging. When they do, judges may learn that the tools and techniques to collect linked attachments and preserve family relationships exist, but the producing party elected not to employ them. Granted, these tools aren’t perfect; but they exist, and perfect is not the standard, just as pretending there are no solutions and doing nothing is not the standard.

Claims that collecting linked attachments pose an undue burden because of increased volume are mostly nonsense. The longstanding practice has been to collect a custodian’s messages and ALL embedded attachments, then index and search them. With few exceptions, the number of items collected won’t differ materially whether the attachment is embedded or linked (although larger files tend to be linked). So, any party arguing that collecting linked attachments will require the search of many more documents than before is fibbing or out of touch. I try not to attribute to guile that which may be explained by ignorance, so let’s go with the latter.

Half Baked Solutions

Challenged for failing to search linked attachments, a responding party may protest that they searched the transmitting emails and even commit to collecting and searching linked attachments to emails containing search hits. Sounds reasonable, right? Yet, it’s not even close to reasonable. Here’s why:

When using lexical (e.g., keyword) search to identify potentially responsive e-mail “families,” the customary practice is to treat a message and its attachments as potentially responsive if either the content of the transmitting message or its attachment generates search “hits” for the keywords and queries run against them. This is sensible because transmittals often say no more than, “see attached;” it’s the attachment that holds the hits. Yet, stripped of its transmittal, you won’t know the timing or circulation of the attachment. So, we preserve and disclose email families.

But, if we rely upon the content of transmitting messages to prompt a search of linked attachments, we will miss the lion’s share of responsive evidence. If we produce responsive documents without tying them to their transmittals, we can’t tell who got what and when. All that “what did you know and when did you know it” matters.

Why Guess When You Can Measure?

Hopefully, you’re wondering how many hits suggesting relevance occur in transmittals and how many in attachments? How many occur in both? Great questions! Happily, we can measure these things. We can determine, on average, the percentage of messages that produce hits versus their attachments.

If you determine that, say, half of hits were within embedded attachments, then you can fairly attribute that character to linked attachments not being searched. In that sense, you can estimate how much you’re missing and ascertain a key component of a proper proportionality analysis.

So why don’t producing parties asserting burden supply this crucial metric?

The Path Forward

Producing parties have been getting away with murder on linked attachments for so long that they’ve come to view it as an entitlement. Linked attachments are squarely within the ambit of what must be assessed for relevance. The potential for a linked attachment to be responsive is no less than that of an item transmitted as an embedded attachment. So, let’s stop pretending they have a different character in terms of relevance and devote our energies to fixing the process.

Collecting linked attachments isn’t as Herculean as some claim, especially with tools from giants like Microsoft and Google easing the process. The challenge, then, isn’t in the tools but in the willingness to employ them.

Do linked attachments pose problems? They absolutely do! I’ve elided over ancillary issues of versioning and credentials because those concerns reside in the realm between good and perfect solutions. Collection methods must be adapted to them—with clumsy workarounds at first and seamless solutions soon enough. But in acknowledging that there are challenges, we must also acknowledge that these linked attachments have been around for years, and they are evidence. Waiting until the crisis stage to begin thinking about how to deal with them was a choice, and a poor one. I shudder to think of the responsive information ignored every single day because this issue is inadequately appreciated by counsel and courts.

Happily, this is simply a technical challenge and one starting to resolve. Speeding the race to resolution requires that courts stop giving a free pass to the practice of ignoring linked attachments. Abraham Lincoln defined a hypocrite as a “man who murdered his parents, and then pleaded for mercy on the grounds that he was an orphan.” Having created the problem and ignored it for years, it seems disingenuous to indulge requesting parties’ pleas for mercy.

In Conclusion

We’re at a crossroads, with technical solutions within reach and the legal imperative clearer than ever. It’s high time we bridge the gap between digital advancements and discovery obligations, ensuring that no piece of evidence, linked or embedded, escapes scrutiny.

Posted by craigball | Filed under Computer Forensics, E-Discovery, Uncategorized

≈ 18 Comments

ESI Protocols: How Do I Get Out of a Bad Deal?

19 Tuesday Mar 2024

Posted by craigball in Uncategorized

≈ 4 Comments

Tags

law, lawyer

I watched a webinar this morning where the presenters addressed ESI Protocols. They were well-informed people sharing sound advice; but it underscored for me why people despise lawyers. A presenter counseled, “Always build an escape clause into whatever you agree to.”

The speaker meant, if you commit your clients to a protocol provision, and you later find that the client or its service provider can’t or won’t do what was promised, you need to incorporate a “fingers crossed” way to back out of the deal.

Many readers—lawyer readers certainly—will count that as inspired advice. They’ll posit, “Aren’t we protecting our clients when we spare them the hardships of an improvident agreement?” In truth, the risk of being bound by obligations that could prove more onerous or expensive than anticipated is the number one objection I hear voiced when I advocate for use of ESI Protocols.

Who wouldn’t want to walk out on their obligations when the going gets rough? It’s human nature to crave the benefits of a bargain without its burdens; but just try to run a restaurant where everyone walks the check!

The law has a term for what accounts for the difference between a fair deal and a debacle: it’s due diligence. Competent counsel should know the capabilities of both clients and vendors before agreeing to an obligation that hinges on the capabilities of our clients and vendors.

Counsel who agrees to something because he didn’t understand the implications of the agreement won’t want to own that. He will point the finger at anyone and everyone except himself. That, too, is human nature, albeit not a pretty predilection. But let’s face facts: Those lawyers weren’t tricked; they were uninformed and unprepared.

That said, not all unforeseen consequences of an ESI Protocol grow out a lack of diligence or competence.

People make mistakes. You do. I do. And when we do, the question becomes: Who should bear the brunt of our mistakes? And when should the consequences of our mistakes be limited by proportionality and (for lack of a better term) mercy?

Long before I became an attorney, some canny counsel decided that the optimum legal advice to a culpable client was to admit nothing, don’t apologize, deny, deny, deny and mount a strong offense as your best defense. Perhaps that’s why lawyers are the last bastion of characters cast as vile stereotypes in the movies without outcry. Okay, lawyers and Nazis.

If experience means anything, mine suggests that what passes for good legal advice is lousy life advice. If you made an honest mistake in agreeing to a provision of an ESI protocol, the optimum path is to own it and seek to make it right. Sometimes your opponent will relate and work decently to renegotiate the terms. Often, the Court will come to your aid if it’s clear you made a good faith mistake and you own it. Rarely, exceptionally, your client must endure some hardship for the error.

In every case I’ve come across in the last 42 years, that final, onerous outcome coincided with a profound lack of competence or diligence when the deal was struck, the poster child being In re Fannie Mae Sec. Litig., 552 F.3d 814 (D.C. Cir. 2009), but also a line of cases where it’s hard to explain the outcome save for the absence of due diligence, e.g., McCormick & Co., Inc. v. Ryder Integrated Logistics, Inc., ___ F.Supp.3d ___, 2023 WL 2433902 (D. Md. March 9, 2023)

We speak reverently about “the Rule of Law;” but that rule begins within each of us, in our character and commitment. The notion of including a clause in agreements to escape obligations when they become inconvenient is troubling and erodes the integrity of agreements and the foundations of a functioning society.

Lessons from Lousy Lexical Search (and Tips to Do Better)

26 Monday Feb 2024

Posted by craigball in Computer Forensics, E-Discovery, Uncategorized

≈ 7 Comments

Preparing a talk about keyword search, I set out to distill observations gleaned from a host of misbegotten keyword search efforts, many from the vantage point of the court’s neutral expert née Special Master assigned to clean up the mess. What emerged feels a bit…dark…and…uh…grouchy: like truths no one wants to hear because then we might be obliged to change–when we all know how profitable it is to bicker about keywords in endless, costly rounds of meeting and conferring.

The problems I’m dredging up have endured for decades, and their solutions have been clear and accessible for just as long. So, why do we keep doing the same dumb things and expecting different outcomes?

In the 25+ years I’ve studied lexical search of ESI, I’ve learned that:

1. Lexical search is a crude tool that misses much more than it finds and leads to review of a huge volume of non-relevant information. That said, even crude tools work wonders in the hands of skilled craftspeople who chip away with care to produce masterpieces. The efficacy of lexical search increases markedly in the hands of adept practitioners who meticulously research, test and refine their search strategies.

2. Lawyers embrace lexical search despite knowing almost nothing about the limits and capabilities of search tools and without sufficient knowledge of the datasets and indices under scrutiny. Grossly overestimating their ability to compose effective search queries, lawyers blithely proffer untested keywords and Boolean constructs. Per Judge John Facciola a generation ago, lawyers think they’re experts in search “because they once used Google to find a Chinese restaurant in San Francisco that served dim sum and was open on Sundays.”

3. Without exception, every lexical search is informed and improved by the iterative testing of queries against a substantial dataset, even if that dataset is not the data under scrutiny. Iterative testing is invaluable when queries are run against representative samples of the target data. Every. Single. Time.

4. Hit counts alone are a poor measure of whether a lexical search is “good” or “bad.” A “good” query may simply be generating an outsize hit count when run against the wrong dataset in the wrong way (e.g., searching for a person’s name in their own email). Lawyers are too quick to exclude queries with high perceived hit counts before digging into the causes of poor precision.

5. A query’s success depends on how the dataset has been processed and indexed prior to search, challenging the assumption that search mechanisms just ‘work,’ as if by magic.

6. Lexical search is a sloppy proxy for language; and language is replete with subtlety, ambiguity, polysemy and error, all serving to frustrate lexical search. Effective lexical search adapts to accommodate subtlety, ambiguity, polysemy and error by, inter alia, incorporating synonyms, jargon and industry-specific language, common misspellings and alternate spellings (e.g., British vs. American spellings) and homophones, acronyms and initializations.

7. Lexical search’s utility lies equally in filtering out irrelevant data as it does in uncovering relevant information; so, it demands meticulous effort to mitigate the risk of overlooking pertinent documents.

Understanding some of these platitudes requires delving into the science of search and ESI processing. A useful resource might be my 2019 primer on Processing in E-Discovery; admittedly not an easy read for all, but a window into the ways that processing ESI impacts searchability.

Fifteen years ago, I published a short paper called “Surefire Steps to Splendid Search” and set out ten steps that I promised would produce more effective, efficient and defensible queries. Number 7 was:

“Test, Test, Test! The single most important step you can take to assess keywords is to test search terms against representative data from the universe of machines and data under scrutiny. No matter how well you think you know the data or have refined your searches, testing will open your eyes to the unforeseen and likely save a lot of wasted time and money.”

In the fullness of time, those ten steps ring as true today as when George Bush was in the White House. Then, as now, the greatest improvements in lexical search can be achieved with modest tweaks in methodology. A stitch in time saves nine.

Another golden oldie is my 2012 collection of ten brief essays called “Shorties on Search.”

But, as much as I think those older missives hold up, and despite the likelihood that natural language prompts will soon displace old-school search queries, here’s a fresh recasting of my tips for better lexical search:

Essential Tips for Effective Lexical Search in Civil Discovery

Pre-Search Preparation:

Understand the Dataset
- Identify data sources and types, then tailor the search to the data.
- Assess the volume and organization of the dataset. Can a search of fielded data facilitate improved precision?
- Review any pre-processing steps applied, like normalization of case and diacriticals or use of stop words in creating the searchable indices.
Know Your Search Tools
- Familiarize yourself with the tool’s syntax and keyword search capabilities.
- Understand the tool’s limitations, especially with non-textual data and large documents.
Consult with Subject Matter Experts (SMEs)
- Engage SMEs for insights on relevant terminology and concepts.
- Use SME knowledge to refine keyword selection and search strategies.

Search Term Selection and Refinement:

Develop Comprehensive Keyword Lists
- Include synonyms, acronyms, initializations, variants, and industry-specific jargon.
- Consider linguistic and regional variations.
- Account for misspellings, alternate spellings and common transposition errors.
Utilize Boolean Logic and Advanced Operators
- Apply Boolean operators and proximity searches effectively.
- Experiment with wildcards and stemming for broader term inclusion.
Iteratively Test and Refine Search Queries
- Conduct sample searches to evaluate and refine search terms.
- Adjust queries based on testing outcomes and new information.

Execution and Review:

Provide for Consistent Implementation Across Parties and Service Providers
- Use agreed-upon terms where possible. The most defensible search terms and methods are those the parties choose collaboratively.
- Ensure consistency in search term application across the datasets, over time and among multiple parties.
Sample and Manually Review Results
- Randomly sample search results to assess precision and recall.
- Adjust search terms and strategies based on manual review findings.
Negotiate Search Terms with Opposing Counsel
- Engage in discussions to agree on search terms and methodologies.
- Document agreements to preempt disputes over discovery completeness.
- Make abundantly clear whether a non-privileged document hit by a query must be produced or whether (as most producing parties assume) the items hit may nevertheless be withheld after a review for responsiveness.

Post-Search Analysis:

Validate and Document the Search Process
- Maintain comprehensive documentation of search terms, queries, exception items and decisions. Never employ a set of queries to exclude items from discovery without the ability to document the queries and process employed.
- Ensure the search methodology is defensible and compliant with legal standards.
Adapt and Evolve Search Strategies
- Remain flexible to adapt strategies as case evidence and requirements evolve.
- Leverage lessons from current searches to refine future discovery efforts.
Ensure Ethical and Legal Compliance
- Adhere to privacy, privilege, and ethical standards throughout the discovery process.
- Review and apply discovery protocols and court orders accurately.

Will AI Summarization Disrupt Discovery?

26 Friday Jan 2024

Posted by craigball in Uncategorized

≈ 6 Comments

Tags

AI artifiicla intelligence eDiscovery, generative-ai, LLM

Reader’s Digest, the century-old magazine with the highest paid circulation, has long published “condensed” books; anthologies of four-to-five popular novels abridged to fit in a single volume. Condensed Books were once enormously popular, with tens of millions of copies in circulation. They were also an abomination to serious readers, a literary Tang for those who preferred fresh-squeezed OJ. I’ve never read a condensed book, so I’m in no position to judge their merit save to say that I believe reading anything is a good thing. I imagine the condensed versions conveyed the guts of the story well enough to sound like you’d read it over drinks with the neighbors before the Ed Sullivan show.

But I am enough of a purist (okay, “snob”) to worry about the impact of summarization. As an undergraduate English major, I had to wade through some challenging tomes. I have no empirical evidence for it, but I’m certain those books are a part of me in ways they never would have been had I sought out the Cliffs Notes instead. I expect most avid readers feel the same. Summaries necessarily discard content, and what remains is incapable of conveying the same tone, nuance and detail.

So, I worry when the tech industry touts the value of AI summarization of documents, especially as a means of speeding identification and review of evidence in discovery. I question whether the “Reader’s Digest Condensed Evidence” will convey the same tone, nuance and detail that characterize responsive productions. Will distillation be made of distillations until genuine intelligence is lost altogether?

It’s an inchoate apprehension—an old man’s anxiety perhaps—but litigation is about human behavior, human frailty and failings. I fear too much humanity will disappear in AI-generated summaries with the underlying communications less likely to see the light of day. The mandate that discovery be “just, speedy and inexpensive” is now read as “just speedy and inexpensive.” That discarded comma is tragic.

Technology is my lifelong passion. So, I am not afraid of new tech as much as put off by the embrace of technology to further speed and economy without due consideration of quality. LegalWeek 2024 will be a carnival of vendors touting AI features and roadmaps. How many will have metrics to support the quality of their AI-abetted outcomes? How many have forgotten the comma while chasing the cash? Per Upton Sinclair, ““It is difficult to get a man to understand something, when his salary depends on his not understanding it.”

Unquestionably, we must reduce the cost of discovery to protect the portals of justice. Justice no one can afford to pursue is no justice at all. But there are uniquely human characteristics we should continue to esteem in discovery, like curiosity, intuition, suspicion and impression; the “Spidey-sense” we derive from tone, nuance and detail. Before we use AI to summarize collections then deploy AI to characterize the summaries, can we pause just long enough to see if it’s going to work? Real testing, not just that which supports salaries.

Policy for Student Use of AI

11 Monday Dec 2023

Posted by craigball in Uncategorized

≈ 8 Comments

Tags

ai, asu-prep-digital, chatgpt, generative-ai, international-schools, technology, technology-integration

Andy Williams used to croon that this is “The Most Wonderful Time of the Year.” For me, it’s time to update the curriculum for my class on Electronic Discovery and Digital Evidence at the University of Texas in the graduate schools of Law, Computer Science and Information Science. I’ve long built the course around a Workbook I wrote with readings and some two dozen exercises. But, when I last taught the course a year ago, generative AI was hardly a twinkle in Santa’s eye. Now, of course, AI is the topic that’s eaten all others. So, I’ve had to fashion a policy for student use of AI. I elected to embrace student use of AI tools, in part because legal scholarsip is artful plagarism termed “precedent” and–let’s face it–students are going to use LLMs, whatever I say. So, here’s what I’ve come up with. I’ll be grateful for your feedback as comments, most especially if you are an educator facing the same issues with advice born of experience.

Use of Generative Large Language Models to Assist with Exercises

1. Explicit Disclosure Requirement

It is a violation of the honor code to misrepresent work by characterizing it as your own if it is not. Students may use generative LLMs, such as ChatGPT or Bard, for assistance in completing Workbook exercises; however, they must explicitly disclose the use of these tools by providing a brief note or acknowledgment in their submissions. Transparency is mandatory.

2. Verification and Cross-Checking

Students may utilize generative LLMs during Workbook exercises but are required to independently verify and cross-check the information generated by these models through additional research using alternate, reliable sources.

3. Accountability

While generative LLMs are permitted tools, students are held accountable for the accuracy and completeness of the information obtained from these models. Any errors or omissions resulting from the use of LLMs are considered the responsibility of the student. This policy underscores the importance of independent verification and personal accountability.

4. Prohibited for Quizzes and Exams

Notwithstanding the foregoing, you may not consult any source of information, including AI resources, when completing quizzes or the final exam.

POSTSCRIPT: I add this a day after the foregoing, after reading that the Fifth Circuit’s proposed a rule change requiring that counsel and pro se litigants certify of any filed document, that “no generative artificial intelligence program was used in drafting the document…or to the extent such a program was used, all generated text, including all citations and legal analysis, has been reviewed for accuracy and approved by a human.” I recall shaking my head at how foolish it was when a grandstanding district court judge made headlines by requiring such certifications following a high-profile gaffe in New York. “Of course a lawyer must verify the accuracy of legal analysis and citations! Lawyers shouldn’t need to certify that we did what we are required to do!”

Yet, here I am requiring my students to do much the same. I feel confident in advising students that, if they use AI, they must verify the information and sink or swim based on what they submit, even if the AI hallucinates or misleads. Back in the day, lawyers knew they had to “Shepardize” citations to verify that the cases cited were still solid. Proffering a a made-up citation was beyond comprehension.

So, am I right to require explicit disclosure of generative AI? Or will AI soon be woven into so many sources of information that disclosure will feel as foolish as requiring students to disclose they used a word processor instead of a typewriter would have been forty years ago? I’m struggling with this. What do you think?

Monica Bay, 1949-2023

30 Monday Oct 2023

Posted by craigball in Computer Forensics, E-Discovery, General Technology Posts, Personal, Uncategorized

≈ 11 Comments

I’m saddened to share that Monica Bay, the forceful, revered former editor of Law Technology News (now Legaltech News) has died after a long, debilitating illness. Though a durable resident of New York City and Connecticut, Monica’s life ended in California where it began. Monica described herself as a “provocateur,” an apt descriptor from one gifted in finding the bon mot. Monica was a journalist with soaring standards whose writing exemplified the high caliber of work she expected from her writers. I cannot overstate Monica’s importance to the law technology community in her 17 years at the helm of LTN. Monica mentored multitudes and by sheer force of her considerable strength and will, Monica transformed LTN from an industry organ purveying press releases to an award-winning journal unafraid to speak truth to power.

In her time as editor, Monica was everywhere and indefatigable. Monica was my editor for much of her tenure at LTN including nine years where I contributed a monthly column she dubbed “Ball in Your Court” (see what I mean about her mastery of the well-turned phrase?) We had a complicated relationship and butted heads often, but my submissions were always better for Monica’s merciless blue pencil. I owe her an irredeemable debt. She pushed me to the fore. You wouldn’t be reading this now if it weren’t for Monica Bay’s efforts to elevate me. The outsize recognition and writing awards I garnered weren’t my doing but Monica’s. If life were a movie, Monica would be the influential publisher who tells the writer plucked from obscurity, “I made you and I can break you!” And it would be true.

This elegy would have been far better if she’d edited it.

Trying to illuminate Monica, I turned to Gmail to refresh my memory but backed off when I saw we’d shared more than 2,200 conversations since 2005. I’d forgotten how she once loomed so large in my life. In some of those exchanges, Monica generously called me, “hands down my best writer,” but I wouldn’t be surprised if she said that to everyone in her stable of “campers.” Monica knew how to motivate, cajole and stroke the egos of her contributors. She was insightful about ego, too.

In 2010 when I carped that there’s always too much to do, and always somebody unhappy with me, she counseled, “Like me, you are an intense personality, and we can be difficult to live with at times. but that intensity and drive is also what makes you who you are, why you are successful, and why you are a breathtakingly good writer. My favorite people in the world are ‘difficult.’”

I wince as I write that last paragraph because as much as she was brilliant in managing egos, Monica didn’t love that part of her work. She confided, “I think we have to be mindful that we don’t exercise our egos in a way that constrains — or worse case, cripples — those around us. That’s the hard part.”

Monica observed of a well-known commentator of the era, “he wouldn’t be able to write if he had to excise ‘I’ from his vocabulary… he annoys me more than the Red Sox or Jacobs Fields gnats.”

That reminds me that Monica had a personal blog called “The Common Scold.” She named it for a Puritan-era cause of action where opinionated women were punished by a dunk in a pond. I mostly remember it for its focus on New York Yankees baseball, which became a passion for Monica when she moved east despite a lifelong disinterest in sports. Monica, who insofar as I knew, never married, often referred to herself in the Scold as “Mrs. Derek Jeter.” She was quirky that way and had a few quirky rules for writers. One was that the word “solution” was banned, BANNED, in LTN.

To her credit, Monica Bay wasn’t afraid to nip at the hand that feeds. Now, when every outlet has bent to the will of advertisers, Monica’s strict journalistic standards feel at once quaint and noble. Consider this excerpt from her 2009 Editorial Guidelines:

“Plain English: Law Technology News is committed to presenting information in a manner that is easily accessible to our readers. We avoid industry acronyms, jargon, and clichés, because we believe this language obfuscates rather than enhances understanding.

For example, the word “solution” has become meaningless and is banned from LTN unless it’s part of the name of a company. Other words we edit out: revolutionary, deploy, mission critical, enterprise, strategic, robust, implement, seamless, initiative, -centric, strategic [sic], and form factor! We love plain English!”

Monica was many things more than simply an industry leader, from a wonderful choral singer to the niece of celebrated actress, Elaine Stritch. She was my champion, mother figure, friend and scold. I am in her debt. And you are, too, Dear Reader, for Monica Bay pushed through barriers that fell under her confident stride.

Fifteen years ago, when Monica lost her father, and my mother was dying, we supported each other. Monica called her dad’s demise the “great gift of dementia from the karma gods. No pain, just a gentle drift to his next destination.” That beautifully describes her own shuffle off this mortal coil. As the most loving parting gift I can offer my late, brilliant editor, I cede to her those last lovely words, “just a gentle drift to [her] next destination.”

[I have no information about services or memorials, but I look forward to commemorating Monica’s life and contributions with others who loved and admired her]

A nice tribute from Bob Ambrogi: https://www.lawnext.com/2023/10/i-am-deeply-saddened-to-report-the-death-of-monica-bay-friend-mentor-and-role-model-to-so-many-in-legal-tech.html and a sweeet remembrance from Mary Mack: https://edrm.net/2023/10/the-warmest-and-most-uncommon-scold/

Being the Better Expert Witness

07 Thursday Sep 2023

Posted by craigball in Uncategorized

≈ 5 Comments

I’ll need to dust off the cobwebs as I haven’t been in this space in quite some time! I’ve not had much to say, and honestly, if I didn’t sneak “ChatGPT” into the title, who’d notice? Preparing for a September 20th presentation to an international conclave of forensic examiners in Phoenix, I extensively revised and expanded my guide for testifying experts, now called “Being the Better Expert Witness: A Primer for Forensic Examiners.” I describe it thus:

This paper covers ways to become an effective witness and pitfalls to avoid. They say lawyers make notoriously poor witnesses and I have no illusions that I’m a great witness. But after forty years of trial practice and thirty as a forensic examiner, I’ve learned a few lessons I hope might help other examiners build their skills in court.

In the paper, I discuss the difficulty computer forensic examiners face honing their testimonial abilities because it’s rare to be interrogated by a lawyer who truly understands what we are talking about. Most interrogators work from a script. They know the first question to ask, but not the next or the one after that. Pushed from their path, they’re lost. Computer forensic examiners have it easy on the stand. Deep fakes notwithstanding, computer-generated evidence still enjoys an aura of accuracy and objectivity, and the hyper-technical nature of digital forensics awes and intimidates the uninitiated. Thank you, CSI, NCIS and all the rest! But sooner or later, computer forensic examiners will square off against interrogators able to skillfully undermine ability and credibility. I want them to be ready.

As I’m wont to do, I ambled down memory lane:

“Evidence professor John Henry Wigmore famously called cross-examination “the greatest legal engine ever invented for the discovery of truth.” Apparently, every lawyer who writes about cross-examination is obliged to say that. Likewise, every trial lawyer aspires to do a great cross examination, and every judge and juror aspires to hear one. Yet, as I observed at the start, they are rare.”

“Forty years ago, my boss was on the trial team of a lawsuit between Pennzoil and Texaco that resulted in the biggest plaintiff’s verdict of the era and a three-billion-dollar settlement—back when that was a lot of money. The lawyer for Texaco, the big loser, was named Dick Miller, and my boss used to say of him, “Dick Miller has two speeds: OFF and KILL.” I’ll never forget that because it encapsulates how some lawyers approach cross-examination. A truly devastating cross examination flows from applying lessons learned from the raptors in Jurassic Park: get the prey to look one way, while the attack comes from another.“

“In court, that entails laying a trap and not springing it too early. Skilled cross examiners box witnesses in and seal off points of retreat before the witness recognizes the need to run. The very best cross examiners don’t spring their traps during the cross; they save that for final argument.”

“The greatest teacher of cross-examination I’ve ever come across was a former prosecutor, judge and law professor named Irving Younger, who died about 35 years ago. Younger’s famous lecture on the topic was called “The Ten Commandments of Cross-Examination.” I’ve listened to multiple versions of his talk over the years and all are magnificent. Stirring. Funny. Unforgettable. Younger opined that a lawyer must try about 25 cases to begin to be skilled in cross-examination, but he GUARANTEED that any lawyer strictly adhering to his Ten Commandments would be able to conduct a reasonably effective cross-examination. Of course, he added, no lawyer is capable of sticking to all his commandments until the lawyer has about 25 trials under his belt!”

“I do not have ten surefire commandments that will guarantee you won’t get in trouble on cross-examination, but I have a lifetime in court (much of one anyway) and many years teaching law to draw on in offering advice on what to expect on cross plus a few suggested techniques that I GUARANTEE will help you become a better witness.”

So, if you’re looking to help an expert witness new to the role or a veteran making the same old mistakes, perhaps you’ll point them to my new primer at http://www.craigball.com/Ball_Expert_Witness_2023.pdf

Introducing the EDRM E-Mail Duplicate Identification Specification and Message Identification Hash (MIH)

16 Thursday Feb 2023

Posted by craigball in Computer Forensics, E-Discovery, General Technology Posts, Uncategorized

≈ 7 Comments

I’m proud to be the first to announce that the Electronic Discovery Reference Model (EDRM) has developed a specification for cross-platform identification of duplicate email messages, allowing for ready detection of duplicate messages that waste review time and increase cost. Leading e-discovery service and software providers support the new specification, making it possible for lawyers to improve discovery efficiency by a simple addition to requests for production. If that sounds too good to be true, read on and learn why and how it works.

THE PROBLEM

The triumph of information technology is the ease with which anyone can copy, retrieve and disseminate electronically stored information. Yet, for email in litigation and investigations, that blessing comes with the curse of massive replication, obliging document reviewers to assess and re-assess nearly identical messages for relevance and privilege. Duplicate messages waste time and money and carry a risk of inconsistent characterization. Seeing the same thing over-and-over again makes a tedious task harder.

Electronic discovery service providers and software tools ameliorate these costs, burdens and risks using algorithms to calculate hash values—essentially digital fingerprints—of segments of email messages, comparing those hash values to flag duplicates. Hash deduplication works well, but stumbles when minor variations prompt inconsistent outcomes for messages reviewers regard as being “the same.” Hash deduplication fails altogether when messages are exchanged in forms other than those native to email communications—a common practice in U.S. electronic discovery where efficient electronic forms are often printed to static page images.

Without the capability to hash identical segments of identical formats across different software platforms, reviewers cannot easily identify duplicates or readily determine what’s new versus what’s been seen before. When identical messages are processed by different tools and vendors or produced in different forms (so-called “cross-platform productions”), identification of duplicate messages becomes an error-prone, manual process or requires reprocessing of all documents.

Astonishingly, no cross-platform method of duplicate identification has emerged despite decades spent producing email in discovery and billions of dollars burned by reviewing duplicates.

Wouldn’t it be great if there was a solution to this delay, expense and tedium?

THE SOLUTION

When parties produce email in discovery and investigations, it’s customary to supply information about the messages called “metadata” in accompanying “load files.” Load files convey Bates numbers/Document IDs, message dates, sender, recipients and the like. Ideally, the composition of load files is specified in a well-crafted request for production or production protocol. Producing metadata is a practice that’s evolved over time to prompt little argument. For service providers, producing one more field of metadata is trivial, rarely requiring more effort than simply ticking a box.

The EDRM has crafted a new load file field called the EDRM Message Identification Hash (MIH), described in the EDRM Email Duplicate Identification Specification.

Gaining the benefit of the EDRM Email Duplicate Identification Specification is as simple as requesting that load files contain an EDRM Message Identification Hash (MIH) for each email message produced. The EDRM Email Duplicate Identification Specification is an open specification, so no fees or permissions are required to use it, and leading e-discovery service and software providers already support the new specification. For others, it’s simple to generate the MIH without redesigning software or impeding workflows. Too, the EDRM has made free tools available supporting the specification.

Any party with the MIH of an email message can readily determine if a copy of the message exists in their collection. Armed with MIH values for emails, parties can flag duplicates even when those duplicates take different forms, enabling native message formats to be compared to productions supplied as TIFF or PDF images.

The routine production of the MIH supports duplicate identification across platforms and parties. By requesting the EDRM MIH, parties receiving rolling or supplemental productions will know if they’ve received a message before, allowing reviewers to dedicate resources to new and unique evidence. Email messages produced by different parties in different forms using different service providers can be compared to instantly surface or suppress duplicates. Cross-platform email duplicate identification means that email productions can be compared across matters, too. Parties receiving production can easily tell if the same message was or was not produced in other cases. Cross-platform support also permits a cross-border ability to assess whether a message is a duplicate without the need to share personally-identifiable information restricted from dissemination by privacy laws.

IS THIS REALLY NEW?

Yes, and unprecedented. As noted, e-discovery service providers and law firm or corporate e-discovery teams have long employed cryptographic hashing internally to identify duplicate messages; but each does so differently dependent upon the process and software platform employed—sometimes in ways they regard as being proprietary—making it infeasible to compare hash values across providers and platforms. Even if competitors could agree to employ a common method, subtle differences in the way each process and normalize messages would defeat cross-platform comparison.

The EDRM Email Duplicate Identification Specification doesn’t require software platform and service providers to depart from the proprietary ways they deduplicate email. Instead, the Specification contemplates that e-discovery software providers add the ability to produce the EDRM MIH to their platform and that service providers supply a simple-to-determine Message Identification Hash (MIH) value that sidesteps the challenges just described by taking advantage of an underutilized feature of email communication standards called the “Message ID” and pairing it with the power of hash deduplication. If it sounds simple, it is–and by design. It’s far less complex than traditional approaches but sacrifices little or no effectiveness or utility. Crucially, it doesn’t require any difficult or expensive departure from the way parties engage in discovery and production of email messages.

WHAT SHOULD YOU DO TO BENEFIT?

All you need to do to begin reaping the benefits of cross-platform message duplicate identification is amend your Requests for Production to include the EDRM Message Identification Hash (MIH) among the metadata values routinely produced as load files. As a prominently published specification by the leading standards organization in e-discovery, it’s likely the producing party’s service provider or litigation support staff know what’s required. But if not, you can refer them to the EDRM Email Duplicate Identification Specification & Guidelines published at https://edrm.net/active-projects/dupeid/.

HOW DO YOU LEARN MORE?

The EDRM publishes a comprehensive set of resources describing and supporting the Specification & Guidelines that can be found at https://edrm.net/active-projects/dupeid/. All persons and firms deploying the EDRM MIH to identify duplicate messages should familiarize themselves with the considerations for its use.

EDRM WANTS YOUR FEEDBACK

The EDRM welcomes any feedback you may have on this new method of identifying cross platform email duplicates or on any of the resources provided. We are interested in further ideas you may have and expect the use of the EDRM MIH to evolve over time. You can post any feedback or questions at https://edrm.net/active-projects/dupeid/.

Not So Fine Principle Nine

17 Tuesday Jan 2023

Posted by craigball in Computer Forensics, E-Discovery, Uncategorized

≈ 8 Comments

For the second class meeting of my law school courses on E-Discovery and Digital Evidence, I require my students read the fourteen Sedona Conference Principles from the latest edition of “Best Practices, Recommendations & Principles for Addressing Electronic Document Production.” The Sedona principles are the bedrock of that group’s work on ESI and, notwithstanding my misgivings that the Principles have tilted toward blocking discovery more than guiding it, there’s much to commend in each of the three versions of the Principles released over the last twenty years. They enjoy a constitutional durability in the eDiscovery community.

When my students read the Principles, I revisit them and each time, something jumps out at me. This semester, it’s the musty language of Principle 9:

Principle 9: Absent a showing of special need and relevance, a responding party should not be required to preserve, review, or produce deleted, shadowed, fragmented, or residual electronically stored information.
The Sedona Principles, Third Edition: Best Practices, Recommendations & Principles for Addressing Electronic Document Production, 19 SEDONA CONF. J. (2018)

Save for the substitution of “electronically stored information” for the former “data or documents,” Principle 9 hasn’t been touched since its first drafts of 20+ years ago. One could argue its longevity owes to an abiding wisdom and clarity. Indeed, the goals behind P9 are laudable and sound. But the language troubles me, particularly the terms, “shadowed” and “fragmented,” which someone must have pulled out of their … I’ll say “hat” … during the Bush administration, and presumably no one said, “Wait, is that really a thing?” In the ensuing decades, did no one question the wording or endeavor to fix it?

My objection is that both are terms of art used artlessly. Consider “shadowed” ESI. Run a search for shadowed ESI or data, and you’ll not hit anything on point but the Principle itself. Examine the comments to Principle 9 and discover there’s no effort to explain or define shadowed ESI. Head over to The Sedona Conference Glossary: eDiscovery and Digital Information Management, and you’ll find nary a mention of “shadowed” anything.

That is not to say that there wasn’t a far-behind-the-scenes service existing in Microsoft Windows XP and Windows Server to facilitate access to locked files during backup that came to be called “Volume Shadow Copy Services” or “VSS,” but it wasn’t being used for forensics when the language of Principle 9 was floated. I was a forensic examiner at the time and can assure you that my colleagues and I didn’t speak of “shadowed” data or documents.

But whether an argument can be made that it was a “thing” or not twenty years ago, it’s never been a term in common use, nor one broadly understood by lawyers and judges. It’s not defined in the Principles or glossaries. You’ll get no useful guidance from Google.

What harm has it done? None I can point to. What good has it done? None. Yet, it might be time to consign “shadowed” to the dustbin of history and find something less vague. It’s not gospel, it’s gobbledygook.

“Fragmented” is a term that’s long been used in reference to data storage, but not as a synonym for “residual” or “artifact.” Fragmented files refer to information stored in non-contiguous clusters on a storage medium. Many of the files we access and know to be readily accessible are fragmented in this fashion, and no one who understands the term in the context of ESI would confuse “fragmented” data or documents with something burdensome to retrieve. But don’t take my word for that, Sedona’s own glossary backs me up. Sedona’s Principle 9 doesn’t use “fragmented” as Sedona defines it.

If the drafters meant “fragments of data,” intending to convey “artifacts recoverable through computer forensics but not readily accessible to or comprehended by users,” then perhaps other words are needed, though I can’t imagine what those words would add that “deleted” or “residual” doesn’t cover.

This is small potatoes. No one need lose a wink of sleep over the sloppy wording, and I’m not the William Safire of e-discovery or digital forensics; but words matter. When you are writing to guide persons without deep knowledge of the subject matter, your words matter very much. If you use a term of art, make sure it’s a correct usage, a genuine one; and be certain you’ve either used it as experts do or define the anomalous usage in context.

When I fail to do that, Dear Reader, I hope you’ll call me on it, too.

The Annotated ESI Protocol

09 Monday Jan 2023

Posted by craigball in Computer Forensics, E-Discovery, Uncategorized

≈ 26 Comments

Tags

ESI Protocols

Periodically, I strive to pen something practical and compendious on electronic evidence and eDiscovery, drilling into a topic, that hasn’t seen prior comprehensive treatment. I’ve done primers on metadata, forms of production, backup systems, databases, computer forensics, preservation letters, ESI processing, email, digital storage and more, all geared to a Luddite lawyer audience. I’ve long wanted to write, “The Annotated ESI Protocol.” Finally, it’s done.

The notion behind the The Annotated ESI Protocol goes back 40 years when, as a fledgling personal injury lawyer, I found a book of annotated insurance policies. What a prize! Any plaintiff’s lawyer will tell you that success is about more than liability, causation and damages; you’ve got to establish coverage to get paid. Those annotated insurance policies were worth their weight in gold.

As an homage to that treasured resource, I’ve sought to boil down decades of ESI protocols to a representative iteration and annotate the clauses, explaining the “why” and “how” of each. I’ve yet to come across a perfect ESI protocol, and I don’t kid myself that I’ve crafted one. My goal is to offer lawyers who are neither tech-savvy nor e-discovery aficionados a practical, contextual breakdown of a basic ESI protocol–more than simply a form to deploy blindly or an abstract discussion. I’ve seen thirty-thousand-foot discussions of protocols by other commentators, yet none tied to the document or served up with an ESI protocol anyone can understand and accept.

It pains me to supply the option of a static image (“TIFF+”) production, but battleships turn slowly, and persuading lawyers long wedded to wasteful ways that they should embrace native production is a tough row to hoe. My intent is that the TIFF+ option in the example sands off the roughest edges of those execrable images; so, if parties aren’t ready to do things the best way, at least we can help them do better.

Fingers crossed you’ll like The Annotated ESI Protocol and put it to work. Your comments here are always valued.

Ball in your Court

~ Musings on e-discovery & forensics.

Category Archives: Uncategorized

What’s All the Fuss About Linked Attachments?

ESI Protocols: How Do I Get Out of a Bad Deal?

Lessons from Lousy Lexical Search (and Tips to Do Better)

Will AI Summarization Disrupt Discovery?

Policy for Student Use of AI

Monica Bay, 1949-2023

Being the Better Expert Witness

Introducing the EDRM E-Mail Duplicate Identification Specification and Message Identification Hash (MIH)

THE PROBLEM

THE SOLUTION

IS THIS REALLY NEW?

WHAT SHOULD YOU DO TO BENEFIT?

HOW DO YOU LEARN MORE?

EDRM WANTS YOUR FEEDBACK

Not So Fine Principle Nine

The Annotated ESI Protocol

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

THE PROBLEM

THE SOLUTION

IS THIS REALLY NEW?

WHAT SHOULD YOU DO TO BENEFIT?

HOW DO YOU LEARN MORE?

EDRM WANTS YOUR FEEDBACK

Share this:

Share this:

Share this: