• Home
  • About
  • CRAIGBALL.COM
  • Disclaimer
  • Log In

Ball in your Court

~ Musings on e-discovery & forensics.

Ball in your Court

Category Archives: Computer Forensics

Not So Fine Principle Nine

17 Tuesday Jan 2023

Posted by craigball in Computer Forensics, E-Discovery, Uncategorized

≈ 8 Comments

For the second class meeting of my law school courses on E-Discovery and Digital Evidence, I require my students read the fourteen Sedona Conference Principles from the latest edition of “Best Practices, Recommendations & Principles for Addressing Electronic Document Production.” The Sedona principles are the bedrock of that group’s work on ESI and, notwithstanding my misgivings that the Principles have tilted toward blocking discovery more than guiding it, there’s much to commend in each of the three versions of the Principles released over the last twenty years.  They enjoy a constitutional durability in the eDiscovery community.

When my students read the Principles, I revisit them and each time, something jumps out at me.  This semester, it’s the musty language of Principle 9:

Principle 9: Absent a showing of special need and relevance, a responding party should not be required to preserve, review, or produce deleted, shadowed, fragmented, or residual electronically stored information.

The Sedona Principles, Third Edition: Best Practices, Recommendations & Principles for Addressing Electronic Document Production, 19 SEDONA CONF. J. (2018)

Save for the substitution of “electronically stored information” for the former “data or documents,” Principle 9 hasn’t been touched since its first drafts of 20+ years ago.  One could argue its longevity owes to an abiding wisdom and clarity. Indeed, the goals behind P9 are laudable and sound.  But the language troubles me, particularly the terms, “shadowed” and “fragmented,” which someone must have pulled out of their … I’ll say “hat” … during the Bush administration, and presumably no one said, “Wait, is that really a thing?”  In the ensuing decades, did no one question the wording or endeavor to fix it?

My objection is that both are terms of art used artlessly.  Consider “shadowed” ESI.  Run a search for shadowed ESI or data, and you’ll not hit anything on point but the Principle itself.  Examine the comments to Principle 9 and discover there’s no effort to explain or define shadowed ESI.  Head over to The Sedona Conference Glossary: eDiscovery and Digital Information Management, and you’ll find nary a mention of “shadowed” anything. 

That is not to say that there wasn’t a far-behind-the-scenes service existing in Microsoft Windows XP and Windows Server to facilitate access to locked files during backup that came to be called “Volume Shadow Copy Services” or “VSS,” but it wasn’t being used for forensics when the language of Principle 9 was floated.  I was a forensic examiner at the time and can assure you that my colleagues and I didn’t speak of “shadowed” data or documents.

But whether an argument can be made that it was a “thing” or not twenty years ago, it’s never been a term in common use, nor one broadly understood by lawyers and judges.  It’s not defined in the Principles or glossaries.  You’ll get no useful guidance from Google. 

What harm has it done?  None I can point to.  What good has it done?  None.  Yet, it might be time to consign “shadowed” to the dustbin of history and find something less vague.  It’s not gospel, it’s gobbledygook.

“Fragmented” is a term that’s long been used in reference to data storage, but not as a synonym for “residual” or “artifact.”  Fragmented files refer to information stored in non-contiguous clusters on a storage medium.  Many of the files we access and know to be readily accessible are fragmented in this fashion, and no one who understands the term in the context of ESI would confuse “fragmented” data or documents with something burdensome to retrieve.  But don’t take my word for that, Sedona’s own glossary backs me up.  Sedona’s Principle 9 doesn’t use “fragmented” as Sedona defines it.

If the drafters meant “fragments of data,” intending to convey “artifacts recoverable through computer forensics but not readily accessible to or comprehended by users,” then perhaps other words are needed, though I can’t imagine what those words would add that “deleted” or “residual” doesn’t cover.

This is small potatoes. No one need lose a wink of sleep over the sloppy wording, and I’m not the William Safire of e-discovery or digital forensics; but words matter.  When you are writing to guide persons without deep knowledge of the subject matter, your words matter very much.  If you use a term of art, make sure it’s a correct usage, a genuine one; and be certain you’ve either used it as experts do or define the anomalous usage in context.

When I fail to do that, Dear Reader, I hope you’ll call me on it, too.

Share this:

  • Email
  • Print
  • Twitter
  • Facebook
  • LinkedIn

Like this:

Like Loading...

The Annotated ESI Protocol

09 Monday Jan 2023

Posted by craigball in Computer Forensics, E-Discovery, Uncategorized

≈ 25 Comments

Tags

ESI Protocols

Periodically, I strive to pen something practical and compendious on electronic evidence and eDiscovery, drilling into a topic, that hasn’t seen prior comprehensive treatment.  I’ve done primers on metadata, forms of production, backup systems, databases, computer forensics, preservation letters, ESI processing, email, digital storage and more, all geared to a Luddite lawyer audience.  I’ve long wanted to write, “The Annotated ESI Protocol.” Finally, it’s done.

The notion behind the The Annotated ESI Protocol goes back 40 years when, as a fledgling personal injury lawyer, I found a book of annotated insurance policies.  What a prize!  Any plaintiff’s lawyer will tell you that success is about more than liability, causation and damages; you’ve got to establish coverage to get paid.  Those annotated insurance policies were worth their weight in gold.

As an homage to that treasured resource, I’ve sought to boil down decades of ESI protocols to a representative iteration and annotate the clauses, explaining the “why” and “how” of each.  I’ve yet to come across a perfect ESI protocol, and I don’t kid myself that I’ve crafted one.  My goal is to offer lawyers who are neither tech-savvy nor e-discovery aficionados a practical, contextual breakdown of a basic ESI protocol–more than simply a form to deploy blindly or an abstract discussion.  I’ve seen thirty-thousand-foot discussions of protocols by other commentators, yet none tied to the document or served up with an ESI protocol anyone can understand and accept. 

It pains me to supply the option of a static image (“TIFF+”) production, but battleships turn slowly, and persuading lawyers long wedded to wasteful ways that they should embrace native production is a tough row to hoe. My intent is that the TIFF+ option in the example sands off the roughest edges of those execrable images; so, if parties aren’t ready to do things the best way, at least we can help them do better.

Fingers crossed you’ll like The Annotated ESI Protocol and put it to work. Your comments here are always valued.

Share this:

  • Email
  • Print
  • Twitter
  • Facebook
  • LinkedIn

Like this:

Like Loading...

Seven Stages of Snakebitten Search

13 Tuesday Dec 2022

Posted by craigball in Computer Forensics, E-Discovery, General Technology Posts, Uncategorized

≈ 6 Comments

I’ve long been fascinated by electronic search.  I especially love delving into the arcane limitations of lexical search because, awful Grinch that I am, I get a kick out of explaining to lawyers why their hard-fought search queries and protocols are doomed to fail. But, once we work through the Seven Stages of Attorney E-Discovery Grief: Umbrage, Denial, Anger, Angry Denial, Fear, Finger Pointing, Threats and Acceptance, there’s almost always a workaround to get the job done with minimal wailing and gnashing of teeth.

Three consults today afforded three chances to chew over problematic search strategies: 

  • First, the ask was to search for old CAD/CAM drawings in situ on an opponent’s file servers based on words appearing on drawings. 
  • Another lawyer sought to run queries in M365 seeking responsive text in huge attachments.
  • The last lawyer wanted me to search the contents of a third-party’s laptop for subpoenaed documents but without the machine being imaged or its contents processed before search.

Most of my readers are e-discovery professionals so they’ll immediately snap to the reasons why each request is unlikely to work as planned. Before I delve into my concerns, let’s observe that all these requests seemed perfectly reasonable in the minds of the lawyers involved, and why not?  Isn’t that how keyword and Boolean search is supposed to work?  Sadly, our search reach often exceeds our grasp.

Have you got your answers to why they may fail?  Let’s compare notes.

  • When it comes to lexical search, CAD/CAM drawings differ markedly from Word documents and spreadsheets.  Word processed documents and spreadsheets contain text encoded as ASCII or Unicode characters.  That is, text is stored as, um, text.  In contrast, CAD/CAM drawings tend to be vector graphics.  They store instructions describing how to draw the contents of the plans geometrically; essentially how the annotations look rather than what they say. So, the text is an illustration of text, much like a JPG photograph of a road sign or a static TIFF image of a document—both inherently unsearchable for text unless paired with extracted or OCR text in ancillary load files.  Bottom line: Unless the CAD/CAM drawings are subjected to effective optical character recognition before being indexed for search, lexical searches won’t “see” any text on the face of the drawings and will fail.
  • M365 has a host of limits when it comes to indexing Cloud content for search, and of course, if it’s not in the index, it won’t turn up in response to search.  For example, M365 won’t parse and index an email attachment larger than 150MB.  Mind you, few attachments will run afoul of that capacious limit, but some will.  Similarly, M365 will only parse and index the first 2 million characters of any document.  That means only the first 600-1,000 pages of a document will be indexed and searchable.  Here again, that will suffice for the ordinary, but may prove untenable in matters involving long documents and data compilations.  There are other limits on, e.g., how deeply a search will recurse through nested- and embedded content and the body text size of a message that will index.  You can find a list of limits here (https://learn.microsoft.com/en-us/microsoft-365/compliance/limits-for-content-search?view=o365-worldwide#indexing-limits-for-email-messages) and a discussion of so-called “partially indexed” files here (https://learn.microsoft.com/en-us/microsoft-365/compliance/partially-indexed-items-in-content-search?view=o365-worldwide).  Remember, all sorts of file types aren’t parsed or indexed at all in M365.  You must tailor lexical search to the data under scrutiny.  It’s part of counsels’ duty of competence to know what their search tools can and cannot do when negotiating search protocols and responding to discovery using lexical search.
  • In their native environments, many documents sought in discovery live inside various container files ranging from e-mail and attachments in PST and OST mail containers to compressed Zip containers.  Encrypted files may  be thought of as being sealed inside an impenetrable container that won’t be searched.  The upshot is that much data on a laptop or desktop machine cannot be thoroughly searched by keywords and queries by simply running searches within an operating system environment (e.g., in Windows or MacOS).   Accordingly, forensic examiners and e-discovery service providers collect and “process” data to make it amenable to search.  Moreover, serial search of a computer’s hard drive (versus search of an index) is painfully slow, so unreasonably expensive when charged by the hour.  For more about processing ESI in discovery, here’s my 2019 primer (http://www.craigball.com/Ball_Processing_2019.pdf)

In case I don’t post before Chanukah, Christmas and the New Year, have a safe and joyous holiday!

Share this:

  • Email
  • Print
  • Twitter
  • Facebook
  • LinkedIn

Like this:

Like Loading...

Electronic Evidence Workbook 2022

13 Thursday Jan 2022

Posted by craigball in Uncategorized, General Technology Posts, E-Discovery, Computer Forensics

≈ 6 Comments

I’ve released a new version of the Electronic Evidence Workbook used in my three credit E-Discovery and Digital Evidence course at the University of Texas Law School, UT Computer Science School and UT School of Information. I prefer this release over any before because it presents the material more accessibly and logically, better tying the technical underpinnings to trial practice.

The chapters on processing are extensively revamped. I’m hell bent on making encoding understandable, and I’ve incorporated the new Processing Glossary I wrote for the EDRM. Glossaries are no one’s idea of light reading, but I hope this one proves a handy reference as the students cram for the five quizzes and final exam they’ll face.

Recognizing that a crucial component of competence in electronic discovery is mastering the arcane argot of legaltech, I’ve added Vital Vocabulary lists throughout, concluded chapters with Key Takeaway callouts and, for the first time, broken the Workbook into volumes such that this release covers just the first eight classes, almost entirely Information Technology.

Come Spring Break in mid-March, I’ll release the revamped omnibus volume adding new practical exercises in Search, Processing, Production, Review and Meet & Confer and introducing new tools. Because university students use Mac machines more than Windows PCs, the exercises ahead employ Cloud applications so as to be wholly platform-independent. The second half of the course folds in more case law to the relief of law students and chagrin of CS and IS students. The non-law students do a great job on the law but approach it with trepidation; the law students kiss the terra firma of case law like white-knuckled passengers off a turbulent flight.

Though written for grad students, the Workbook is also written for you, Dear Reader. If you’ve longed to learn more about information technology and e-discovery but never knew quite where or how to start, perhaps the 2022 Workbook is your gateway. The law students at UT Austin pay almost $60,000 per year for their educations; I’ll settle for a little feedback from you when you read it.

Share this:

  • Email
  • Print
  • Twitter
  • Facebook
  • LinkedIn

Like this:

Like Loading...

A Dozen Nips and Tucks for E-Discovery

03 Monday Jan 2022

Posted by craigball in Uncategorized, E-Discovery, Computer Forensics

≈ 7 Comments

Annually, I contribute to an E-Discovery Update presentation for top tier trial lawyers and annually I struggle to offer a handout that will be short enough for attendees to read and sufficiently pointed to prompt action. Ironically, predictably, the more successful the lawyers in attendance, the less moved they are to seek fresh approaches to discovery. Yet, we would be wise to observe that success tends not to depart abruptly but slips away on little cat feet, or as Hemingway described the velocity of a character’s path to bankruptcy, “Gradually, then suddenly.” A few nips and tucks may be all that’s needed to stay in fighting form. Accordingly, I wanted my list to be pithy with actionable takeaways like “have a production protocol, get a review platform and test your queries.” That may seem painfully obvious to you, Dear Reader, but it’s guidance yet to be embraced by leading lights in law. Here’s my 2022 list:

  1. Forms from a decade ago are obsolete.  Update your preservation letters and legal hold notices.  Remember: preservation letters go to the other side; legal hold notices to your clients.
  2. Custodial holds don’t fly.  Just telling a client, “don’t delete relevant data” isn’t enough and a misstep oft-cited by courts as attorney malfeasance.  Lawyers must guide and supervise clients in the identification, preservation and collection of relevant evidence.
  3. Be sure your legal hold process incorporates all elements of a defensible notification:
    i. Notice is Timely
    ii. Communicated through an effective channel
    iii. Issued by person(s) with clout
    iv. Sent to all necessary custodians
    v. Communicates gravity and accountability
    vi. Supplies context re: claim or litigation
    vii. Offers clear, practical guidance re: actions and deadlines
    viii. Sensibly scopes sources and forms
    ix. Identifies mechanism and contact for questions
    x. Incorporates acknowledgement, follow up and refresh
  4. Data dies daily; systems automatically purge and overwrite data over time.  The law requires parties promptly intercede to prevent loss of potentially relevant information by altering purge settings and otherwise interdicting deletion.  Don’t just assume it’s preserved, check to be certain.
  5. No e-discovery effort is complete in terms of preservation and collection if it fails to encompass mobile devices and cloud repositories.  Competent trial lawyers employ effective, defensible methods to protect, collect and review relevant mobile and cloud information.
  6. The pandemic pushed data to non-traditional locations and applications.  Don’t overlook data in conferencing apps like Zoom and collaboration tools like Slack.
  7. You should have an up-to-date ESI production protocol that fits the data and workflow. Know what an ESI protocol does and what features you can negotiate without prompting adverse outcomes.
  8. Don’t rely on untested keyword queries to find evidence.  Embrace the science of search.  TEST!
  9. Modern litigation demands use of review systems dedicated to electronically stored information (ESI) and staff trained in their use. Asked “What’s your review platform?” You should know the answer.
  10. Vendors paid by the gigabyte lack incentive to trim data volumes.  Clients will thank you to have sound strategies to cull and deduplicate the data that vendors ingest and host.  Big savings lie there.
  11. Courts demand an unprecedented level of communication and cooperation respecting ESI.  Transparency of process signals confidence and competence in your approach to e-discovery.
  12. There are no more free passes for ignorance.  Now, learn it, get help or get out.

Share this:

  • Email
  • Print
  • Twitter
  • Facebook
  • LinkedIn

Like this:

Like Loading...

Then his head exploded!

28 Tuesday Sep 2021

Posted by craigball in Uncategorized, General Technology Posts, E-Discovery, Computer Forensics

≈ 2 Comments

In the introduction to my Electronic Evidence Workbook, I note that my goal is to change the way readers think about electronically stored information and digital evidence. I want all who take my courses to see that modern electronic information is just a bunch of numbers and not be daunted by those numbers.

I find numbers reassuring and familiar, so I occasionally forget that some are allergic to numbers and loathe to wrap their heads around them.

Lately, one of my bright students identified himself as a “really bad with numbers person.” My lecture was on encoding as prologue to binary storage, and when I shifted too hastily from notating numbers in alternate bases (e.g., Base 2, 10, 16 and 64) and started in on encoding textual information as numbers (ASCII, Unicode), my student’s head exploded.

Boom!

At least that’s what he told me later. I didn’t hear anything when it happened, so I kept nattering on happily until class ended.

As we chatted, I realized that my student expected that encoding and decoding electronically stored information (ESI) would be a one-step process.  He was having trouble distinguishing the many ways that numbers (numeric values) can be notated from the many ways that numbers represent (“encode”) text and symbols like emoji.  Even as I write that sentence I suspect he’s not alone.

Of course, everyone’s first hurdle in understanding encoding is figuring out why to care about it at all.  Students care because they’re graded on their mastery of the material, but why should anyone else care; why should lawyers and litigation professionals like you care?  The best answer I can offer is that you’ll gain insight.  It will change the way you think about ESI in the same way that algebra changes the way you think about problem solving.  If you understand the fundamental nature of electronic evidence, you will be better equipped to preserve, prove and challenge its integrity as accurate and reliable information.

Electronic evidence is just data, and data are just numbers; so, understanding the numbers helps us better understand electronic evidence.

Understanding encoding requires we hearken back to those hazy days when we learned to tally and count by numbers.  Long ago, we understood quantities (numeric values) without knowing the numerals we would later use to symbolize quantities.  When we were three or four, “five” wasn’t yet Arabic 5, Roman V or even a symbolic tally like ||||. 

More likely, five was this:

If you’re from the Americas, Europe or Down Under, I’ll wager you were taught to count using the decimal system, a positional notation system with a base of 10.  Base 10 is so deeply ingrained in our psyches that it’s hard to conceive of numeric values being written any other way.  Decimal just feels like one, “true” way to count, but it’s not.  Writing numbers using an alternate base or “radix” is just as genuine, and it’s advantageous when information is stored or transmitted digitally.

Think about it.  Human beings count by tens because we evolved with ten digits on our hands.  Were that not so, old jokes like this one would make no sense: “Did you hear about the Aggie who was arrested for indecent exposure?  He had to count to eleven.”

Had our species evolved with eight fingers or twelve, we would have come to rely upon an octal or duodecimal counting system, and we would regard those systems as the “true” positional notation system for numeric values.  Ten only feels natural because we built everything around ten.

Computers don’t have fingers; instead, computers count using a slew of electronic switches that can be “on” or “off.”  Having just two states (on/off) makes it natural to count using Base 2, a binary counting system.  By convention, computer scientists notate the status of the switches using the numerals one and zero.  So, we tend to say that computers store information as ones and zeroes.  Yet, they don’t.

Computer storage devices like IBM cards, hard drives, tape, thumb drives and optical media store information as physical phenomena that can be reliably distinguished in either of two distinct states, e.g., punched holes, changes in magnetic polar orientation, minute electric potentials or deflection of laser beams.   We symbolize these two states as one or zero, but you could represent the status of binary data by, say, turning a light on or off.  Early computing systems did just that, hence all those flashing lights.

You can express any numeric value in any base without changing its value, just as it doesn’t change the numeric value of “five” to express it as Arabic “5” or Roman “V” or just by holding up five fingers. 

In positional notation systems, the order of numerals determines their contribution to the value of the number; that is, their contribution is the value of the digit multiplied by a factor determined by the position of the digit and the base.

The base/radix describes the number of unique digits, starting from zero, that a positional numeral system uses to represent numbers.  So, there are just two digits in base 2 (binary), ten in base 10 (decimal) and sixteen in base 16 (hexadecimal).  E-mail attachments are encoded using a whopping 64 digits in base 64.

We speak the decimal number 31,415 as “thirty-one thousand, four hundred and fifteen,” but were we faithfully adhering to its base 10 structure, we might say, “three ten thousands, one thousand, four hundreds, one ten and five ones.  The “base” ten means that there are ten characters used in the notation (0-9) and the value of each position is ten times the value of the position to its right.

The same decimal number 31,415 can be written as a binary number this way: 111101010110111

In base 2, two characters are used in the notation (0 and 1) and each position is twice the value of the position to its right.  If you multiply each digit times its position value and add the products, you’ll get a total equal in value to the decimal number 31,415.

A value written as five characters in base 10 requires 15 characters in base 2.  That seems inefficient until you recall that computers count using on-off switches and thrive on binary numbers.

The decimal value 31,415 can be written as a base 16 or hexadecimal number this way: 7AB7

In base 16, sixteen characters are used in the notation (0-9 and A-F) and each position is sixteen times the value of the position to its right.  If you multiply each digit times its position value and add the products, you’ll get a total equal in value to the decimal number 31,415.  But how do you multiply letters like A, B, C, D, E and F?  You do it by knowing the letters are used to denote values greater than 9, so A=10, B=11, C=12, D=13, E=14 and F=15.  Zero through nine plus the six values represented as letters comprise the sixteen characters needed to express numeric values in hexadecimal.

Once more, If you multiply each digit/character times its position value and add the products, you’ll get a total equal in value to the decimal number 31,415:

Computers work with binary data in eight-character sequences called bytes.  A binary sequence of eight ones and zeros (“bits”) can be arranged in 256 unique ways.   Long sequences of ones and zeroes are hard for humans to follow, so happily, two hexadecimal characters can also be arranged in 256 unique ways, meaning that just two base-16 characters can replace the eight characters of a binary byte (i.e., a binary value of 11111111 can be written in hex as FF).  Using hexadecimal characters allows programmers to write data in just 25% of the space required to write the same data in binary, and it’s easier for humans to follow.

Let’s take a quick look at why this is so.  A single binary byte can range from 0 to 255 (being 00000000 to 11111111).  Computers count from zero, so that range spans 256 unique values. The following table demonstrates why the largest value of an eight character binary byte (11111111) equals the largest value of just two hexadecimal characters (FF):

Hexadecimal values are everywhere in computing.  Litigation professionals encounter hexadecimal values as MD5 hash values and may run into them as IP addresses, Globally Unique Identifiers (GUIDs) and even color references.

Encoding Text

So far, I’ve described ways to encode the same numeric value in different bases.  Now, let’s shift gears to describe how computers use those numeric values to signify intelligible alphanumeric information like the letters of an alphabet, punctuation marks and emoji.  Again, data are just numbers, and those numbers signify something in the context of the application using that data, just as gesturing with two fingers may signify the number two, a peace sign, the V for Victory or a request that a blackjack dealer split a pair.  What numbers mean depends upon the encoding scheme applied to the values in the application; that is, the encoding scheme supplies the essential context needed to make the data intelligible.  If the number is used to describe an RGB color, then the hex value 7F00FF means violet.  Why?  Because each of the three values that make up the number (7F 00 FF) denote how much of the colors red, green and blue to mix to create the desired RGB color. In other contexts,  the same hex value could mean the decimal number 8,323,327, the binary string 11111110000000011111111 or the characters 缀ÿ.

ASCII

When the context is text, there are a host of standard ways, called Character Encodings or Code Pages, in which the numbers denote letters, punctuation and symbols.  Now nearly sixty years old, the American Standard Code for Information Interchange (ASCII, “ask-key”) is the basis for most modern character encoding schemes (though both Morse code and Baudot code are older).  Born in an era of teletypes and 7-bit bytes, ASCII’s original 128 codes included 33 non-printable codes for controlling machines (e.g., carriage return, ring bell) and 95 printable characters.  The ASCII character set follows:

Windows-1252

Later, when the byte standardized from seven to eight bits (recall a bit is a one or zero), 128 additional characters could be added to the character set, prompting the development of extended character encodings. Arguably the most used single-byte character set in the world is the Windows-1252 code page, the characters of which are set out in the following table (red dots signify unassigned values). 

Note that the first 128 control codes and characters (from NUL to DEL) match the ASCII encodings and the 128 characters that follow are the extended set.  Each character and control code has a corresponding fixed byte value, i.e., an upper-case B is hex 40 and the section sign, §, is hex A7.  To see the entire code page character set and the corresponding hexadecimal encodings on Wikipedia, click here.  Again, ASCII and the Windows-1252 code page are single byte encodings so they are limited to a maximum of 256 characters.

Unicode

The Windows-1252 code page works reasonably well so long as you’re writing in English and most European languages; but sporting only 256 characters, it won’t suffice if you’re writing in, say, Greek, Cyrillic, Arabic or Hebrew, and it’s wholly unsuited to Asian languages like Chinese, Japanese and Korean. 

Though programmers developed various ad hoc approaches to foreign language encodings, an increasingly interconnected world needed universal, systematic encoding mechanisms.  These methods would use more than one byte to represent each character, and the most widely adopted such system is Unicode.  In its latest incarnation (version 14.0, effective 9/14/21), Unicode standardizes the encoding of 159 written character sets called “scripts” comprising 144,697 characters, plus multiple symbol sets and emoji characters.

The Unicode Consortium crafted Unicode to co-exist with the longstanding ASCII and ANSI character sets by emulating the ASCII character set in corresponding byte values within the more extensible Unicode counterpart, UTF-8.  UTF-8 can represent all 128 ASCII characters using a single byte and all other Unicode characters using two, three or four bytes.  Because of its backward compatibility and multilingual adaptability, UTF-8 has become the most popular text encoding standard, especially on the Internet and within e-mail systems. 

Exploding Heads and Encoding Challenges

As tempting as it is to regard encoding as a binary backwater never touching lawyers’ lives, encoding issues routinely lie at the root of e-discovery disputes, even when the term “encoding” isn’t mentioned.  “Load file problems” are often encoding issues, as may be “search difficulties,” “processing exceptions” and “corrupted data.”  If an e-discovery processing tool reads Windows-1252 encoded text expecting UTF-8 encoded text or vice-versa, text and load files may be corrupted to the point that data will need to be re-processed and new production sets generated.  That’s costly, time-consuming and might be wholly avoidable, perhaps with just the smattering of knowledge of encoding gained here.

Share this:

  • Email
  • Print
  • Twitter
  • Facebook
  • LinkedIn

Like this:

Like Loading...

Ten Tips for Better ESI Expert Reports

24 Monday May 2021

Posted by craigball in General Technology Posts, E-Discovery, Computer Forensics

≈ 5 Comments

A lawyer I admire asked me to talk to her colleague about expert reports.  I haven’t had that conversation yet, but the request got me thinking about the elements of a competent expert report, especially reports in my areas of computer forensics and digital evidence.  I dashed off ten things I thought contribute to the quality of the best expert reports.  If these were rules, I’d have to concede I’ve learned their value by breaking a few of them.  I’ve left out basic writing tips like “use conversational language and simple declarative sentences.” There are lists of rules for good writing elsewhere and you should seek them out.  Instead, here’s my impromptu list of ten tips for crafting better expert reports on technical issues in electronic discovery and computer forensics:

  1. Answer the questions you were engaged to resolve.
  2. Don’t overreach your expertise.
  3. Define jargon, and share supporting data in useful, accessible ways.
  4. Distinguish factual findings from opinions.
  5. Include language addressing the applicable evidentiary standard.
  6. Eschew advocacy; let your expertise advocate for you.
  7. Challenge yourself and be fair.
  8. Proofread.  Edit.  Proofread again. Sleep on it. Edit again.
  9. Avoid assuming the fact finder’s role in terms of ultimate issues.
  10. Listen to your inner voice.

Most of these are self-explanatory but please permit me a few clarifying comments.

Answer the questions you were engaged to resolve.

My pet peeve with expert reports is that they don’t always address the questions important to the court and counsel.  I’ve seen reports spew hundreds of pages of tables and screenshots without conveying what any of it means to the issues in the case.  Sometimes you can’t answer the questions.  Fine.  Say so.  Other times you must break down or reframe the questions to conform to the evidence.  That’s okay, too, IF it’s not an abdication of the task you were brought in to accomplish.  But, the best, most useful and intelligible expert reports pose and answer specific questions.

Don’t overreach your expertise.

The standard to qualify as an expert witness is undemanding: do you possess specialized knowledge that would assist the trier of fact in understanding the evidence or resolving issues of fact? See, e.g., Federal Rule of Evidence 702.  With the bar so low, it can be tempting to overreach your expertise, particularly when pushed by a client to opine on something you aren’t fully qualified to address.  For example, I’m a certified computer forensic examiner and I studied accounting in college, but I’m not a forensic accountant.  I know a lot about digital forgery, but I’m not a trained questioned document examiner.  These are specialties.  I try to stay in my own lane and commend it to other experts.

Define jargon, and share supporting data in useful, accessible ways.

Can someone with an eighth-grade education and no technical expertise beyond that of the average computer user understand your report?  If not, you’re writing for the wrong audience.  We should write to express, not impress.  I love two-dollar words and the bon mot phrase, but they don’t serve me well when writing reports.  Never assume that a technical term will be universally understood.  If your grandparents wouldn’t know what it means, define it.

Computer forensic tools are prone to generate lengthy “reports” rife with incomprehensible data.  It’s tempting to tack them on as appendices to add heft and underscore how smart one must be to understand it all.  But it’s the expert’s responsibility to act as a guide to the data and ensure its import is clear.  I rarely testify—even by affidavit–without developing annotated demonstrative examples of the supporting data.  Don’t wait for the deposition or hearing to use demonstrative evidence; make points clear in the report.

Too, I’m fond of executive summaries; that is, an up-front, cut-to-the-chase paragraph relating the upshot of the report.

Distinguish factual findings from opinions.

The key distinction between expert and fact witnesses is that expert witnesses are permitted to express opinions that go beyond their personal observation.  A lay witness to a crash may testify to speeds based only upon what they saw with their own eyes.  An accident reconstructionist can express an opinion of how fast the cars were going based upon evidence that customarily informs expert opinions like skid marks and vehicle deformation.  Each type of testimony must satisfy different standards of proof in court; so, to make a clear and defensible record, it’s good practice to distinguish factual findings (“things you saw”) from opinions (“things you’ve concluded based upon what you saw AND your specialized knowledge, training and experience”).  This  naturally begets the next tip:

Include language addressing the applicable evidentiary standard.

Modern jurisprudence deploys safeguards like the Daubert standard to combat so-called “junk science.”  Technical expert opinions must be based upon a sound scientific methodology, viz., sufficient facts or data and the product of reliable principles and methods.  While a court acting as gatekeeper can infer the necessary underpinnings from an expert’s report and C.V., expressly stating that opinions are based upon proper and accepted standards makes for a better record.

Eschew advocacy; let your expertise advocate for you.

Mea culpa here.  Because I was a trial lawyer for three+ decades, I labor to restrain myself in my reporting to ensure that I’m not intruding into the lawyer’s realm of advocacy.  I don’t always succeed.  Even if you’re working for a side, be as scrupulously neutral as possible in your reporting.  Strive to act and sound like you don’t care who prevails even if you’re rooting for the home team.  If you do your job well, the facts will advocate the right outcome.

Challenge yourself and be fair.

My worst nightmare as an expert witness is that I will mistakenly opine that someone committed a bad act when they didn’t.  So, I’m always trying to punch holes in my own theories and asking myself, “how would I approach this if I were working for the other side?”  Nowhere is this more important than when working as a court-appointed neutral expert.  Even if you’d enjoying seeing a terrible person fry, be fair.  You stand in the shoes of the Court.

Proofread.  Edit.  Proofread again. Sleep on it. Edit again.

Who has that kind of time, right?  Still, try to find the time.  Few things undermine the credibility of an expert report like a bunch of spelling and grammatical errors.  Stress and fatigue make for poor first drafts.  It often takes a good night’s sleep (or at least a few hours away from the work) to catch the inartful phrase, typo or other careless error.

Avoid assuming the fact finder’s role in terms of ultimate issues.

Serving as a court Special Master a few years back, I opined that the evidence of a certain act was so overwhelming that the Court should only reach one result.  Accordingly, I ceased investigating the loss of certain data that I regarded as out-of-scope.  I was right…but I was also wrong.  The Court has a job to do and, by my eliding over an issue the Court was obliged to address, the Court had to rule without benefit of what a further inquiry into the missing evidence would have revealed. The outcome was the same, but by assuming the factfinder’s role on an ultimate issue, I made the Court’s job harder.  Don’t do that.

Listen to your inner voice.

In expressing expert opinions, too much certainty—a/k/a arrogance–is as perilous as too much doubt.  Perfect is not the standard, but you should be reasonably confident of  your opinion based on a careful and competent review of the evidence.  If something “feels” off, it may be your inner voice telling you to look again. 

Share this:

  • Email
  • Print
  • Twitter
  • Facebook
  • LinkedIn

Like this:

Like Loading...

Final Exam Review: How Would You Fare?

28 Wednesday Apr 2021

Posted by craigball in E-Discovery, Computer Forensics

≈ 6 Comments

It’s nearly finals time for the students in my E-Discovery and Digital Evidence course at the University of Texas School of Law. I just completed the Final Exam Study Guide for the class and thought readers who wonder what a tech-centric law school e-discovery curriculum looks like might enjoy seeing what’s asked of the students in a demanding 3-credit law school course. Whether you’re ACEDS certified, head of your e-discovery practice group or just an e-discovery groupie like me, consider how you’d fare preparing for an exam with this scope and depth. I’m proud of my bright students. You’d be really lucky to hire one of my stars.

E-Discovery – Spring 2021 Final Exam Study Guide

The final exam will cover all readings, lectures, exercises and discussions on the syllabus.
(Syllabus ver. 21.0224 in conjunction with Workbook ver. 21.0214 and Announcements).

  1. We spent a month on meeting the preservation duty and proportionality.  You undertook a two-part legal hold drafting exercise.  Be prepared to bring skills acquired from that effort to bear on a hypothetical scenario.  Be prepared to demonstrate your understanding of the requisites of fashioning a defensible legal hold and sensibly targeting a preservation demand to an opponent.  As well, your data mapping skills should prove helpful in addressing the varied sources of potentially relevant ESI that exist, starting at the enterprise level with The Big Six (e-mail, network shares, mobile devices, local storage, social networking and databases).  Of course, we must also consider Cloud repositories and scanned paper documents as potential sources.
  2. An essential capability of an e-discovery lawyer is to assess a case for potentially relevant ESI, fashion and implement a plan to identify accessible and inaccessible sources, determine their fragility and persistence, scope and deploy a litigation hold and take other appropriate first steps to counsel clients and be prepared to propound and respond to e-discovery, especially those steps needed to make effective use of the FRCP Rule 26(f) meet-and-confer process.  Often, you must act without having all the facts you’d like and rely upon your general understanding of ESI and information systems to put forward a plan to acquire the facts and do so with sensitivity to the cost and disruption your actions may engender.  Everything we’ve studied was geared to instilling those capabilities in you.
  3. CASES: You are responsible for all cases covered during the semester.  When you read each case, you should ask yourself, “What proposition might I cite this case to support in the context of e-discovery?”  That’s likely to be the way I will have you distinguish the cases and use them in the exam.  I refer to cases by their style (plaintiff versus defendant), so you should be prepared to employ a mnemonic to remember their most salient principles of each, e.g., Columbia Pictures is the ephemeral data/RAM case; Rambus is the Shred Day case; In re NTL is the right of control case; In re: Weekley Homes is the Texas case about accessing the other side’s hard drives, Wms v. Sprint is the spreadsheet metadata case (you get the idea).  I won’t test your memory of jurists, but it’s helpful-not-crucial to recall the authors of the decisions (especially when they spoke to our class like Judges Peck and Grimm). 

Case Review Hints:

  • Green v. Blitz: (Judge Ward, Texas) This case speaks to the need for competence in those responsible for preservation and collection and what constitutes a defensible eDiscovery strategy. What went wrong here? What should have been done differently?
  • In re: Weekly Homes: (Texas Supreme Court) This is one of the three most important Texas cases on ESI. You should understand the elements of proof which the Court imposes for access to an opponent’s storage devices and know terms of TRCP Rule 196.4, especially the key areas where the state and Federal ESI rules diverge.
  • Zubulake: (Judge Scheindlin, New York) The Zubulake series of decisions are seminal to the study of e-discovery in the U.S.  Zubulake remains the most cited of all EDD cases, so is still a potent weapon even after the Rules amendments codified much of its lessons. Know what the case is about, how the plaintiff persuaded the court that documents were missing and what the defendant did or didn’t do in failing to meet its discovery obligations. Know what an adverse inference instruction is and how it was applied in Zubulake versus what must be established under FRCP Rule 37€ after 2015. Know what Judge Scheindlin found to be a litigant’s and counsel’s duties with respect to preservation. Seven-point analytical frameworks (as for cost-shifting) make good test fodder.
  • Williams v. Sprint: (Judge Waxse, Kansas). Williams is a seminal decision respecting metadata. In Williams v. Sprint, the matter concerned purging of metadata and the locking of cells in spreadsheets in the context of an age discrimination action after a reduction-in-force. Judge Waxse applied Sedona Principle 12 in its earliest (and now twice revised) form. What should Sprint have done?  Did the Court sanction any party? Why or why not?
  • Rodman v. Safeway: (Judge Tigar, ND California) This case, like Zubulake IV, looks at the duties and responsibilities of counsel when monitoring a client’s search for and production of potentially responsive ESI? What is Rule 26(g), and what does it require? What constitutes a reasonable search? To what extent and under what circumstances may counsel rely upon a client’s actions and representations in preserving or collecting responsive ESI?
  • Columbia Pictures v. Bunnell: (Judge Chooljian, California) What prompted the Court to require the preservation of such fleeting, ephemeral information? Why were the defendants deemed to have control of the ephemeral data? Unique to its facts?
  • In re NTL, Inc. Securities Litigation: (Judge Peck, New York) Be prepared to discuss what constitutes control for purposes of imposing a duty to preserve and produce ESI in discovery and how it played out in this case. I want you to appreciate that, while a party may not be obliged to succeed in compelling the preservation or production of relevant information beyond its care, custody or control, a party is obliged to exercise all such control as the party actually possesses, whether as a matter of right or by course of dealing. What’s does The Sedona Conference think about that?
  • William A. Gross Constr. Assocs., Inc. v. Am. Mfrs. Mut. Ins. Co.: (Judge Peck, New York) What was the “wake up call,” who were expected to awaken and on what topics?
  • Adams v. Dell: (Judge Nuffer, Utah) What data was claimed to have been lost? What was supposed to have triggered the duty to preserve? What did the Court say about a responding party’s duty, particularly in designing its information systems? Outlier?
  • RAMBUS: (Judge Whyte, California) I expect you to know what happened and to appreciate that the mere reasonable anticipation of litigation–especially by the party who brings the action–triggers the common law duty to preserve. Be prepared to address the sorts of situations that might or might not trigger a duty to initiate a legal hold.
  • United States v. O’Keefe (Judge Facciola, DC): I like this case for its artful language (Where do angels fear to tread?) and consideration of the limits and challenges of keyword search.  The last being a topic that bears scrutiny wherever it has been addressed in the material.  That is, does keyword search work as well as lawyers think, and how can we improve upon it and compensate for its shortcomings? 
  • Victor Stanley v. Creative Pipe I & II (Judge Grimm, Maryland):  Read VS I with an eye toward understanding the circumstances when inadvertent production triggers waiver (pre-FRE 502).  What are the three standards applied to claims of waiver?  What needs to be in the record to secure relief?

    Don’t get caught up in the prolonged factual minutiae of VS II.  Read VS II to appreciate the varying standards that once existed across the Circuits for imposition of spoliation sanctions and that pre-date the latest FRCP Rules amendments, i.e., Rule 37(e).
  • Anderson Living Trust v. WPX Energy Production, LLC (Judge Browning, New Mexico): This case looks at the application and intricacies of FRCP Rule 34 when it comes to ESI versus documents.  My views about the case were set out in the article you read called “Breaking Badly.”
  • In re: State Farm Lloyds (Texas Supreme Court):  Proportionality is the buzzword here; but does the Court elevate proportionality to the point of being a costly hurdle serving to complicate a simple issue?  What does this case portend for Texas litigants in terms of new hoops to jump over issues as straightforward as forms of production?  What role did the Court’s confusion about forms (and a scanty record) play in the outcome?
  • Monique Da Silva Moore, et al. v. Publicis Groupe & MSL Group and Rio Tinto Plc v. Vale S.A., (Judge Peck, New York): DaSilva Moore is the first federal decision to approve the use of the form of Technology Assisted Review (TAR) called Predictive Coding as an alternative to linear, manual review of potentially responsive ESI.  Rio Tinto is Judge Peck’s follow up, re-affirming the viability of the technology without establishing an “approved” methodology.
  • Brookshire Bros. v. Aldridge (Texas Supreme Court): This case sets out the Texas law respecting spoliation of ESI…or does it?  Is the outcome and “analysis” here consistent with the other preservation and sanctions cases we’ve covered?
  • VanZant v. Pyle (Judge Sweet, New York): Issues of control and spoliation drive this decision.  Does the Court correctly apply Rule 37(e)?
  • CAT3 v. Black Lineage (Judge Francis, New York):This trademark infringement dispute concerned an apparently altered email.  Judge Francis found the alteration sufficient to support sanctions under Rule 37(e).  How did he get there?  Judge Francis also addressed the continuing viability of discretionary sanctions despite 37(e).  What did he say about that?
  • EPAC v. Thos. Nelson, Inc.: Read this report closely to appreciate how the amended Rules, case law and good practice serve to guide the court in fashioning remedial measures and punitive sanctions.  Consider the matter from the standpoint of the preservation obligation (triggers and measures) and from the standpoint of proportionate remedial measures and sanctions.  What did the Special Master do wrong here?
  • Mancia v. Mayflower (Judge Grimm, Maryland): Don’t overlook this little gem in terms of its emphasis on counsel’s duties under FRCP Rule 26(g).  What are those duties?  What do they signify for e-discovery? What is the role of cooperation in an adversarial system?
  • Race Tires America, Inc. v. Hoosier Racing Tire Corp. (Judge Vanaskie, Pennsylvania): This opinion cogently defines the language and limits of 28 U.S.C. §1920 as it relates to the assessment of e-discovery expenses as “taxable costs.”  What common e-discovery expenses might you seek to characterize as costs recoverable under §1920, and how would you make your case?
  • Zoch v. Daimler (Judge Mazzant, Texas):  Did the Court correctly resolve the cross-border and blocking statute issues?  Would the Court’s analysis withstand appellate scrutiny once post-GDPR? 

Remember: bits, bytes, sectors, clusters (allocated and unallocated), tracks, slack space, file systems and file tables, why deleted doesn’t mean gone, forensic imaging, forensic recovery techniques like file carving, EXIF data, geolocation, file headers/binary signatures, hashing, normalization, de-NISTing, deduplication and file shares.  For example: you should know that an old 3.5” floppy disk typically held no more than 1.44MB of data, whereas the capacity of a new hard drive or modern backup tape would be measured in terabytes. You should also know the relative capacities indicated by kilobytes, megabytes, gigabytes, terabytes and petabytes of data (i.e., their order of ascendancy, and the fact that each is 1,000 times more or less than the next or previous tier).  Naturally, I don’t expect you to know the tape chronology/capacities, ASCII/hex equivalencies or other ridiculous-to-remember stuff.

4. TERMINOLOGY: Lawyers, more than most, should appreciate the power of precise language.  When dealing with professionals in technical disciplines, it’s important to call things by their right name and recognize that terms of art in one context don’t necessarily mean the same thing in another.  When terms have been defined in the readings or lectures, I expect you to know what those terms mean.  For example, you should know what ESI, EDRM, RAID, system and application metadata (definitely get your arms firmly around application vs. system metadata), retention, purge and rotation mean (e.g., grandfather-father-son rotation); as well as Exchange, O365, 26(f), 502(d), normalization, recursion, native, near-native, TIFF+, load file, horizontal, global and vertical deduplication, IP addressing, data biopsy, forensically sound, productivity files, binary signatures and file carving, double deletion, load files, delimiters, slack space, unallocated clusters, UTC offset, proportionality, taxable costs, sampling, testing, iteration, TAR, predictive coding, recall, precision, UTC, VTL, SQL, etc.

5. ELECTRONIC DISCOVERY REFERENCE MODEL:  We’ve returned to the EDRM many times as we’ve moved from left to right across the iconic schematic.  Know it’s stages, their order and what those stages and triangles signify.

6. ENCODING: You should have a firm grasp of the concept of encoded information, appreciating that all digital data is stored as numbers notated as an unbroken sequence of 1s and 0s. How is that miracle possible? You should be comfortable with the concepts described in pp. 132-148 of the Workbook (and our class discussions of the fact that the various bases are just ways to express numbers of identical values in different notations). You should be old friends with the nature and purpose of, e.g., base 2 (binary), base 10 (decimal) base 16 (hexadecimal), base 64 (attachment encoding), ASCII and UNICODE.

7 STORAGE: You should have a working knowledge of the principal types and capacities of common electromagnetic and solid-state storage devices and media (because data volume has a direct relationship to cost of processing and time to review in e-discovery). You should be able to recognize and differentiate between, e.g., floppy disks, thumb drives, optical media, hard drives, solid state storage devices, RAID arrays and backup tape, including a general awareness of how much data they hold. Much of this is in pp. 22-48 of the Workbook (Introduction to Data Storage Media).  For ready reference and review, I’ve added an appendix to this study guide called, “Twenty-One Key Concepts for Electronically Stored Information.”

8. E-MAIL: E-mail remains the epicenter of corporate e-discovery; so, understanding e-mail systems, forms and the underlying structure of a message is important.  The e-mail chapter should be reviewed carefully.  I wouldn’t expect you to know file paths to messages or e-mail forensics, but the anatomy of an e-mail is something we’ve covered in detail through readings and exercises.  Likewise, the messaging protocols (POP, MAPI, IMAP, WEB, MIME, etc.), mail single message and container formats (PST, OST, EDB, NSF, EML, MSG, DBX, MHTML, MBOX) and leading enterprise mail client-server pairings (Exchange/Outlook, Domino/Notes, O365/browser) are worth remembering.  Don’t worry, you won’t be expected to extract epoch times from boundaries again. 😉

9. FORMS: Forms of production loom large in our curriculum.  Being that everything boils down to just an unbroken string of ones-and-zeroes, the native forms and the forms in which we elect to request and produce them (native, near-native, images (TIFF+ and PDF), paper) play a crucial role in all the “itys” of e-discovery: affordability, utility, intelligibility, searchability and authenticability.  What are the purposes and common structures of load files?  What are the pros and cons of the various forms of production?  Does one size fit all?  How does the selection of forms play out procedurally in federal and Texas state practice?  How do we deal with Bates numbering and redaction?  Is native and near-native production better and, if so, how do we argue the merits of native production to someone wedded to TIFF images?  This is HUGE in my book!  There WILL be at least one essay question on this and likely several other test questions.

10. SEARCH AND REVIEW: We spent a fair amount of time talking about and doing exercises on search and review.  You should understand the various established and emerging approaches to search: e.g., keyword search, Boolean search, fuzzy search, stemming, clustering, predictive coding and Technology Assisted Review (TAR).  Why is an iterative approach to search useful, and what difference does it make?  What are the roles of testing, sampling and cooperation in fashioning search protocols?  How do we measure the efficacy of search?  Hint: You should know how to calculate recall and precision and know the ‘splendid steps’ to take to improve the effectiveness and efficiency of keyword search (i.e., better F1 scores). 

You should know what a review tool does and customary features of a review platform.  You should know the high points of the Blair and Maron study (you read and heard about it multiple times, so you need not read the study itself).  Please also take care to understand the limitations on search highlighted in your readings and those termed The Streetlight Effect.

11.ACCESSIBILITY AND GOOD CAUSE: Understand the two-tiered analysis required by FRCP Rule 26(b)(2)(B).  When does the burden of proof shift, and what shifts it?  What tools (a/k/a conditions) are available to the Court to protect competing interests of the parties

12. FRE RULE 502: It’s your friend!  Learn it, love it, live it (or at least know when and how to use it).  What protection does it afford against subject matter waiver?  Is there anything like it in state practice?  Does it apply to all legally cognized privileges?

13. 2006 AND 2015 RULES AMENDMENTS: You should understand what they changed with respect to e-discovery.  Concentrate on proportionality and scope of discovery under Rule 26, along with standards for sanctions under new Rule 37(e).  What are the Rule 26 proportionality factors?  What are the findings required to obtain remedial action versus serious sanctions for spoliation of ESI under 37(e)?  Remember “intent to deprive.”

14. MULTIPLE CHOICE: When I craft multiple choice questions, there will typically be two answers you can quickly discard, then two you can’t distinguish without knowing the material. So, if you don’t know an answer, you increase your odds of doing well by eliminating the clunkers and guessing. I don’t deduct for wrong answers.  Read carefully to not whether the question seeks the exception or the rule. READ ALL ANSWERS before selecting the best one(s) as I often include an “all of the above” or “none of the above” option.

15. All lectures and reviews of exercises are recorded and online for your review, if desired.

16. In past exams, I used the following essay questions.  These will not be essay questions on your final exam; however, I furnish them here as examples of the scope and nature of prior essay questions:

EXAMPLE QUESTION A: On behalf of a class of homeowners, you sue a large bank for alleged misconduct in connection with mortgage lending and foreclosures. You and the bank’s counsel agree upon a set of twenty Boolean and proximity queries including:

  • fnma AND deed-in-lieu
  • 1/1/2009 W/4 foreclos!
  • Resumé AND loan officer
  • LTV AND NOT ARM
  • (Problem W/2 years) AND HARP

These are to be run against an index of ten loan officers’ e-mail (with attached spreadsheets, scanned loan applications, faxed appraisals and common productivity files) comprising approximately 540,000 messages and attachments).  Considering the index search problems discussed in class and in your reading called “The Streetlight Effect in E-Discovery,” identify at least three capabilities or limitations of the index and search tool that should be determined to gauge the likely effectiveness of the contemplated searches.  Be sure to explain why each matter. 

I am not asking you to assess or amend the agreed-upon queries.  I am asking what needs to be known about the index and search tool to ascertain if the queries will work as expected.

EXAMPLE QUESTION B: The article, A Bill of Rights for E-Discovery included the following passage:

I am a requesting party in discovery.

I have duties.

I am obliged to: …

Work cooperatively with the producing party to identify reasonable and effective means to reduce the cost and burden of discovery, including, as appropriate, the use of tiering, sampling, testing and iterative techniques, along with alternatives to manual review and keyword search.

Describe how “tiering, sampling, testing and iterative techniques, along with alternatives to manual review and keyword search” serve to reduce the cost and burden of e-discovery.  Be sure to make clear what each term means.

It’s been an excellent semester and a pleasure for me to have had the chance to work with a bright bunch.  Thank you for your effort!  I’ve greatly enjoyed getting to know you notwithstanding the limits imposed by the pandemic and Mother Nature’s icy wrath.  I wish you the absolute best on the exam and in your splendid careers to come.  Count me as a future resource to call on if I can be of help to you.  Best of Luck!   Craig Ball

APPENDIX

Twenty-One Key Concepts for Electronically Stored Information

  1. Common law imposes a duty to preserve potentially relevant information in anticipation of litigation.
  2. Most information is electronically stored information (ESI).
  3. Understanding ESI entails knowledge of information storage media, encodings and formats.
  4. There are many types of e-storage media of differing capacities, form factors and formats:
    a) analog (phonograph record) or digital (hard drive, thumb drive, optical media).
    b) mechanical (electromagnetic hard drive, tape, etc.) or solid-state (thumb drive, SIM card, etc.).
  5. Computers don’t store “text,” “documents,” “pictures,” “sounds.” They only store bits (ones or zeroes).
  6. Digital information is encoded as numbers by applying various encoding schemes:
    a) ASCII or Unicode for alphanumeric characters.
    b) JPG for photos, DOCX for Word files, MP3 for sound files, etc.
  7. We express these numbers in a base or radix (base 2 binary, 10 decimal, 16 hexadecimal, 60 sexagesimal). E-mail messages encode attachments in base 64.
  8. The bigger the base, the smaller the space required to notate and convey the information.
  9. Digitally encoded information is stored (written):
    a) physically as bytes (8-bit blocks) in sectors and partitions.
    b) logically as clusters, files, folders and volumes.
  10. Files use binary header signatures to identify file formats (type and structure) of data.
  11. Operating systems use file systems to group information as files and manage filenames and metadata.
  12. Windows file systems employ filename extensions (e.g., .txt, .jpg, .exe) to flag formats.
  13. All ESI includes a component of metadata (data about data) even if no more than needed to locate it.
  14. A file’s metadata may be greater in volume or utility than the contents of the file it describes.
  15. File tables hold system metadata about the file (e.g., name, locations on disk, MAC dates): it’s CONTEXT.
  16. Files hold application metadata (e.g., EXIF geolocation data in photos, comments in docs): it’s CONTENT.
  17. File systems allocate clusters for file storage, deleting files releases cluster allocations for reuse.
  18. If unallocated clusters aren’t reused, deleted files may be recovered (“carved”) via computer forensics.
  19. Forensic (“bitstream”) imaging is a method to preserve both allocated and unallocated clusters.
  20. Data are numbers, so data can be digitally “fingerprinted” using one-way hash algorithms (MD5, SHA1).
  21. Hashing facilitates identification, deduplication and de-NISTing of ESI in e-discovery.

Share this:

  • Email
  • Print
  • Twitter
  • Facebook
  • LinkedIn

Like this:

Like Loading...

The Great Pandemic Leap

22 Thursday Apr 2021

Posted by craigball in General Technology Posts, E-Discovery, Computer Forensics

≈ 4 Comments

Much has been made of the “Great Pandemic Leap” by law firms and courts. Pandemic proved to be, if not the mother of invention, at least the mother****** who FINALLY got techno tardy lawyers to shuffle forward. The alleged leap had nothing to do with new technology. Zoom and other collaboration tools have been around a long time. In fact, April 21, 2021 was Zoom’s 10th Birthday! Happy Birthday, Zoom! Thanks for being there for us.

No, it wasn’t new technology. The ‘Ten Years in Ten Weeks’ great leap was enabled by compulsion, adoption and support.

“Compulsion” because we couldn’t meet face-to-face, and seeing faces (and slides and white boards) is important.
“Adoption” because so many embraced Zoom and its ilk that we suddenly enjoyed a common meeting place.
“Support” because getting firms and families up and running on Zoom et al. became a transcendent priority.

It didn’t hurt that schools moving to Zoom served to put a support scion in many lawyers’ homes and, let’s face it Atticus, the learning curve wasn’t all that steep. Everyone already had a device with camera and microphone. Zoom made it one-click easy to join a meeting, even if eye-level camera positioning and unmuting of microphones has proven more confounding to lawyers than the Rule Against Perpetuities.

For me, the Great Leap manifested as the near-universal ability to convene on a platform where screen sharing and remote control were simple. I’ve long depended on remote control and screen sharing tools to access machines by Remote Desktop Protocol (RDP) or TeamViewer (not to mention PCAnywhere and legacy applications that made WFH possible in the 90s and aughts). But, that was on my own machines. Linking to somebody else’s machine without a tech-savvy soul on the opposite end was a nightmare. If you’ve ever tried to remotely support a parent, you understand. “No, Mom, please don’t click anything until I tell you. Oh, you already did? What did the error message say? Next time, don’t hit ‘Okay” until you read the message, please Mom.“

E-discovery and digital forensics require defensible data identification, preservation and collection. The pandemic made deskside reviews and onsite collection virtually impossible, or more accurately, those tasks became possible only virtually. Suddenly, miraculously, everyone knew how to join a Zoom call, so custodians could share screens and hand over remote control of keyboard and mouse. I could record the sessions to document the work and remotely load software (like iMazing or CoolMuster) to preserve and access mobile devices. Remote control and screen sharing let me target collection efforts based on my judgment and not be left at the mercy of a custodian’s self-interested actions. Custodians could observe, assist and intervene in my work or they could opt to walk away and leave me to do my thing. I was “there,” but less intrusively and spared the expense and hassle of travel. I could meet FRCP 26(g) obligations and make a record to return to if an unforeseen issue arose.

In my role as investigator, there’s are advantages attendant to being onsite; e.g., I sometimes spot evidence of undisclosed data sources. But, weighed against the convenience and economy of remote identification and collection, I can confidently say I’m never going back to the old normal when I can do the work as well via Zoom.

Working remotely as I’ve described requires a passing familiarity with Zoom screen sharing, if only to be able to talk others through unseen menus. As Zoom host, you will need to extend screen sharing privileges to the remote user. Do this on-the-fly by making the remote user a meeting co-host, (click “More” alongside their name in the Participants screen). Alternatively, you can select Advanced Sharing Options from the Share Screen menu. Under “Who can Share?” choose “All Participants.”

To acquire control of the remote user’s mouse and keyboard, have the remote user initiate a screen share then open the View Options dropdown menu alongside the green bar indicating you’re viewing a shared screen. Select “Request Remote Control,” then click “Request” to confirm. The remote user will see a message box seeking authorization to control their screen. Once authorized, click inside the shared screen window to take control of the remote machine.

If you need to inspect a remote user’s iPhone or iPad, Zoom supports sharing those devices using a free plugin that links the mobile device over the same WiFi connection as the Zoom session. To initiate an iPhone/iPad screen share, instruct the remote user to click Screen Share and then select the iPhone/iPad icon at right for further instructions. Simpler still, have the remote user install Zoom on the phone or pad under scrutiny and join the Zoom session from the mobile device. Once in the meeting, the remote user screen shares from the session on the mobile device. Easy-peasy AND it works for Android phones, too!

So Counselor, go ahead and take that victory lap. Whether you made a great leap or were dragged kicking and screaming to a soupçon of technical proficiency, it’s great to see you! Hang onto those gains, and seek new ways to leverage technology in your practice. Your life may no longer depend on it, but your future certainly does.

Share this:

  • Email
  • Print
  • Twitter
  • Facebook
  • LinkedIn

Like this:

Like Loading...

Can a Producing Party Refuse to Produce Linked Attachments to E-Mail?

25 Thursday Mar 2021

Posted by craigball in Computer Forensics, E-Discovery

≈ 5 Comments

A fellow professor of e-discovery started my morning with a question. He wrote, “In companies using Google business, internal email ‘attachments’ are often linked with a URL to documents on a Google drive rather than actually ‘attached’ to the email…. Can the producing party legally refuse to produce the document as an attachment to the email showing the family? Other links in the email to, for example, a website need not be produced.“

I replied that I didn’t have the definitive answer, but I had a considered opinion. First, I challenged the assertion, “Other links in the email to, for example, a website need not be produced.”

Typically, the link must be produced because it’s part of the relevant and responsive, non-privileged message.  But, the link is just a pointer, a path to an item, and the discoverability of the link’s target hinges upon whether the (non-privileged) target is responsive AND within the care, custody or subject to the control of the producing party.

For the hypothetical case, I assume that the transmittal is deemed relevant and the linked targets are either relevant by virtue of their being linked to the transmittal or independently relevant and responsive.  I also assume that the linked target remains in the care, custody or subject to the control of the producing party because it has a legal and practical right of access to the repository where the linked target resides; that is, the producing party CAN access the linked item, even if they would rather not retrieve the relevant, responsive and non-privileged content to which the custodian has linked the transmittal.

If the link is not broken and the custodian of the message could click the link and access the linked target, where is the undue burden and cost?  Certainly I well know that collection is often delegated to persons other than the custodian, but shouldn’t we measure undue burden and cost from the standpoint of the custodian under the legal duty to preserve and produce, NOT from the perspective of a proxy engaged to collect, but lacking the custodian’s ability to collect, the linked target? Viewed in this light, I don’t see where the law excuses the producing party from collecting and producing the linked target

The difficulty in collection cited results from the producing party contracting to delegate storage to a third-party Cloud Provider, linking to information relegated to the Cloud Provider’s custody.  In certain respects, it’s like the defendant in Columbia Pictures v. Bunnell, who put a contractor (Panther) in control of the IP addresses of the persons trading pirated movies via the defendant’s platform.  Just because you enlist someone to keep your data on your behalf doesn’t defeat your ultimate right of control or your duty of production. 

Having addressed duty, let’s turn to feasibility, which is really what the fight’s about. 

Two key issues I see are: 

1. What if the link is broken by the passage of time?  If the target cannot be collected after reasonable efforts to do so, then it may be infeasible to effect the collection via the link or via pairing the link address to the addresses of the contents of the repository (as by using the Download-Link-Generator tool I highlighted here).  If there is simply no way a link created before a legal hold duty attached can be tied to its target, then you can’t do it, and the Court can’t order the impossible.  But, you can’t just label something “impossible” because you’d rather not do it.  You must make reasonable efforts and you must prove infeasibility. Courts should look askance at claims of infeasibility asserted by producing parties who have created the very situations that make it harder to obtain discovery.

2. What if the content of the target has changed since the time it was linked?  This is where the debate gets stickier, and where I have little empathy for a producing party who expects to be excused from production on the basis that it altered the evidence.  If the evidence has changed to the point where its relevance is in question because it may have been materially changed after linking, then the burden to prove the material change (and diminished relevance) falls on the producing party, not the requesting party.  Else, you take your evidence as you find it, and you produce it as it exists at the time of preservation and collection.  The possibility that it changed goes to its admissibility and weight, not to its discoverability.

I hope you agree my analysis is sound. To paraphrase Abraham Lincoln, you cannot murder your parents and then seek leniency because you’re an orphan. The problem is solvable, but it will be resolved only when Courts supply the necessary incentive by ordering collection and production. Integrating a hash value of the target within the link might go a long way to curing this Humpty-Dumpty dilemma; then, the target can be readily identified AND proven to be in the same state as when the link was created.

While we are at it, embedded links should be addressed from the standpoint of security and ethics. If a producing party supplies a message or document with a live link and opposing counsel’s clicks on the link exposing information not meant to be produced, whose head should roll there? If a party produces a live link in an email, is it reasonable to assume that the target was delivered, too? To my mind, the link is fair game, just as the attachment would be had it been embedded in the message. Electronic delivery is delivery. We have rules governing inadvertent production of privileged content, but not for the scenario described.

Share this:

  • Email
  • Print
  • Twitter
  • Facebook
  • LinkedIn

Like this:

Like Loading...
← Older posts
Follow Ball in your Court on WordPress.com

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 4,740 other subscribers

Recent Posts

  • ChatGPT Proves a Mediocre Law Student January 27, 2023
  • Not So Fine Principle Nine January 17, 2023
  • The Annotated ESI Protocol January 9, 2023
  • Seven Stages of Snakebitten Search December 13, 2022
  • Don’t Seek Direct Access to Opponents’ Devices November 17, 2022

Archives

RSS Feed RSS - Posts

CRAIGBALL.COM

Helping lawyers master technology

Categories

EDD Blogroll

  • eDiscovery Journal (Greg Buckles)
  • Corporate E-Discovery Blog (Zapproved )
  • E-Discovery Law Alert (Gibbons)
  • CS DISCO Blog
  • BowTie Law (Josh Gilliland)
  • EDA Blog (Kelly Twigger)
  • Ride the Lightning (Sharon Nelson)
  • Basics of E-Discovery (Exterro)
  • Sedona Conference
  • The Relativity Blog
  • Complex Discovery (Rob Robinson)
  • eDIP (Chris Dale)
  • Litigation Support Guru (Amy Bowser-Rollins)
  • eDiscovery Today (Doug Austin)
  • E-Discovery Law (K&L Gates)
  • E-D Team (Ralph Losey)
  • Illuminating eDiscovery (Lighthouse)
  • ED&E (Michael Arkfeld)
  • GLTC (Tom O'Connor)

Admin

  • Register
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com

Enter your email address to follow Ball in Your Court and receive notifications of new posts by email.

Website Powered by WordPress.com.

  • Follow Following
    • Ball in your Court
    • Join 1,877 other followers
    • Already have a WordPress.com account? Log in now.
    • Ball in your Court
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...
 

    %d bloggers like this: