Ten Tips for Better ESI Expert Reports

A lawyer I admire asked me to talk to her colleague about expert reports.  I haven’t had that conversation yet, but the request got me thinking about the elements of a competent expert report, especially reports in my areas of computer forensics and digital evidence.  I dashed off ten things I thought contribute to the quality of the best expert reports.  If these were rules, I’d have to concede I’ve learned their value by breaking a few of them.  I’ve left out basic writing tips like “use conversational language and simple declarative sentences.” There are lists of rules for good writing elsewhere and you should seek them out.  Instead, here’s my impromptu list of ten tips for crafting better expert reports on technical issues in electronic discovery and computer forensics:

  1. Answer the questions you were engaged to resolve.
  2. Don’t overreach your expertise.
  3. Define jargon, and share supporting data in useful, accessible ways.
  4. Distinguish factual findings from opinions.
  5. Include language addressing the applicable evidentiary standard.
  6. Eschew advocacy; let your expertise advocate for you.
  7. Challenge yourself and be fair.
  8. Proofread.  Edit.  Proofread again. Sleep on it. Edit again.
  9. Avoid assuming the fact finder’s role in terms of ultimate issues.
  10. Listen to your inner voice.

Most of these are self-explanatory but please permit me a few clarifying comments.

Answer the questions you were engaged to resolve.

My pet peeve with expert reports is that they don’t always address the questions important to the court and counsel.  I’ve seen reports spew hundreds of pages of tables and screenshots without conveying what any of it means to the issues in the case.  Sometimes you can’t answer the questions.  Fine.  Say so.  Other times you must break down or reframe the questions to conform to the evidence.  That’s okay, too, IF it’s not an abdication of the task you were brought in to accomplish.  But, the best, most useful and intelligible expert reports pose and answer specific questions.

Don’t overreach your expertise.

The standard to qualify as an expert witness is undemanding: do you possess specialized knowledge that would assist the trier of fact in understanding the evidence or resolving issues of fact? See, e.g., Federal Rule of Evidence 702.  With the bar so low, it can be tempting to overreach your expertise, particularly when pushed by a client to opine on something you aren’t fully qualified to address.  For example, I’m a certified computer forensic examiner and I studied accounting in college, but I’m not a forensic accountant.  I know a lot about digital forgery, but I’m not a trained questioned document examiner.  These are specialties.  I try to stay in my own lane and commend it to other experts.

Define jargon, and share supporting data in useful, accessible ways.

Can someone with an eighth-grade education and no technical expertise beyond that of the average computer user understand your report?  If not, you’re writing for the wrong audience.  We should write to express, not impress.  I love two-dollar words and the bon mot phrase, but they don’t serve me well when writing reports.  Never assume that a technical term will be universally understood.  If your grandparents wouldn’t know what it means, define it.

Computer forensic tools are prone to generate lengthy “reports” rife with incomprehensible data.  It’s tempting to tack them on as appendices to add heft and underscore how smart one must be to understand it all.  But it’s the expert’s responsibility to act as a guide to the data and ensure its import is clear.  I rarely testify—even by affidavit–without developing annotated demonstrative examples of the supporting data.  Don’t wait for the deposition or hearing to use demonstrative evidence; make points clear in the report.

Too, I’m fond of executive summaries; that is, an up-front, cut-to-the-chase paragraph relating the upshot of the report.

Distinguish factual findings from opinions.

The key distinction between expert and fact witnesses is that expert witnesses are permitted to express opinions that go beyond their personal observation.  A lay witness to a crash may testify to speeds based only upon what they saw with their own eyes.  An accident reconstructionist can express an opinion of how fast the cars were going based upon evidence that customarily informs expert opinions like skid marks and vehicle deformation.  Each type of testimony must satisfy different standards of proof in court; so, to make a clear and defensible record, it’s good practice to distinguish factual findings (“things you saw”) from opinions (“things you’ve concluded based upon what you saw AND your specialized knowledge, training and experience”).  This  naturally begets the next tip:

Include language addressing the applicable evidentiary standard.

Modern jurisprudence deploys safeguards like the Daubert standard to combat so-called “junk science.”  Technical expert opinions must be based upon a sound scientific methodology, viz., sufficient facts or data and the product of reliable principles and methods.  While a court acting as gatekeeper can infer the necessary underpinnings from an expert’s report and C.V., expressly stating that opinions are based upon proper and accepted standards makes for a better record.

Eschew advocacy; let your expertise advocate for you.

Mea culpa here.  Because I was a trial lawyer for three+ decades, I labor to restrain myself in my reporting to ensure that I’m not intruding into the lawyer’s realm of advocacy.  I don’t always succeed.  Even if you’re working for a side, be as scrupulously neutral as possible in your reporting.  Strive to act and sound like you don’t care who prevails even if you’re rooting for the home team.  If you do your job well, the facts will advocate the right outcome.

Challenge yourself and be fair.

My worst nightmare as an expert witness is that I will mistakenly opine that someone committed a bad act when they didn’t.  So, I’m always trying to punch holes in my own theories and asking myself, “how would I approach this if I were working for the other side?”  Nowhere is this more important than when working as a court-appointed neutral expert.  Even if you’d enjoying seeing a terrible person fry, be fair.  You stand in the shoes of the Court.

Proofread.  Edit.  Proofread again. Sleep on it. Edit again.

Who has that kind of time, right?  Still, try to find the time.  Few things undermine the credibility of an expert report like a bunch of spelling and grammatical errors.  Stress and fatigue make for poor first drafts.  It often takes a good night’s sleep (or at least a few hours away from the work) to catch the inartful phrase, typo or other careless error.

Avoid assuming the fact finder’s role in terms of ultimate issues.

Serving as a court Special Master a few years back, I opined that the evidence of a certain act was so overwhelming that the Court should only reach one result.  Accordingly, I ceased investigating the loss of certain data that I regarded as out-of-scope.  I was right…but I was also wrong.  The Court has a job to do and, by my eliding over an issue the Court was obliged to address, the Court had to rule without benefit of what a further inquiry into the missing evidence would have revealed. The outcome was the same, but by assuming the factfinder’s role on an ultimate issue, I made the Court’s job harder.  Don’t do that.

Listen to your inner voice.

In expressing expert opinions, too much certainty—a/k/a arrogance–is as perilous as too much doubt.  Perfect is not the standard, but you should be reasonably confident of  your opinion based on a careful and competent review of the evidence.  If something “feels” off, it may be your inner voice telling you to look again. 

Final Exam Review: How Would You Fare?

It’s nearly finals time for the students in my E-Discovery and Digital Evidence course at the University of Texas School of Law. I just completed the Final Exam Study Guide for the class and thought readers who wonder what a tech-centric law school e-discovery curriculum looks like might enjoy seeing what’s asked of the students in a demanding 3-credit law school course. Whether you’re ACEDS certified, head of your e-discovery practice group or just an e-discovery groupie like me, consider how you’d fare preparing for an exam with this scope and depth. I’m proud of my bright students. You’d be really lucky to hire one of my stars.

E-Discovery – Spring 2021 Final Exam Study Guide

The final exam will cover all readings, lectures, exercises and discussions on the syllabus.
(Syllabus ver. 21.0224 in conjunction with Workbook ver. 21.0214 and Announcements).

  1. We spent a month on meeting the preservation duty and proportionality.  You undertook a two-part legal hold drafting exercise.  Be prepared to bring skills acquired from that effort to bear on a hypothetical scenario.  Be prepared to demonstrate your understanding of the requisites of fashioning a defensible legal hold and sensibly targeting a preservation demand to an opponent.  As well, your data mapping skills should prove helpful in addressing the varied sources of potentially relevant ESI that exist, starting at the enterprise level with The Big Six (e-mail, network shares, mobile devices, local storage, social networking and databases).  Of course, we must also consider Cloud repositories and scanned paper documents as potential sources.
  2. An essential capability of an e-discovery lawyer is to assess a case for potentially relevant ESI, fashion and implement a plan to identify accessible and inaccessible sources, determine their fragility and persistence, scope and deploy a litigation hold and take other appropriate first steps to counsel clients and be prepared to propound and respond to e-discovery, especially those steps needed to make effective use of the FRCP Rule 26(f) meet-and-confer process.  Often, you must act without having all the facts you’d like and rely upon your general understanding of ESI and information systems to put forward a plan to acquire the facts and do so with sensitivity to the cost and disruption your actions may engender.  Everything we’ve studied was geared to instilling those capabilities in you.
  3. CASES: You are responsible for all cases covered during the semester.  When you read each case, you should ask yourself, “What proposition might I cite this case to support in the context of e-discovery?”  That’s likely to be the way I will have you distinguish the cases and use them in the exam.  I refer to cases by their style (plaintiff versus defendant), so you should be prepared to employ a mnemonic to remember their most salient principles of each, e.g., Columbia Pictures is the ephemeral data/RAM case; Rambus is the Shred Day case; In re NTL is the right of control case; In re: Weekley Homes is the Texas case about accessing the other side’s hard drives, Wms v. Sprint is the spreadsheet metadata case (you get the idea).  I won’t test your memory of jurists, but it’s helpful-not-crucial to recall the authors of the decisions (especially when they spoke to our class like Judges Peck and Grimm). 

Case Review Hints:

  • Green v. Blitz: (Judge Ward, Texas) This case speaks to the need for competence in those responsible for preservation and collection and what constitutes a defensible eDiscovery strategy. What went wrong here? What should have been done differently?
  • In re: Weekly Homes: (Texas Supreme Court) This is one of the three most important Texas cases on ESI. You should understand the elements of proof which the Court imposes for access to an opponent’s storage devices and know terms of TRCP Rule 196.4, especially the key areas where the state and Federal ESI rules diverge.
  • Zubulake: (Judge Scheindlin, New York) The Zubulake series of decisions are seminal to the study of e-discovery in the U.S.  Zubulake remains the most cited of all EDD cases, so is still a potent weapon even after the Rules amendments codified much of its lessons. Know what the case is about, how the plaintiff persuaded the court that documents were missing and what the defendant did or didn’t do in failing to meet its discovery obligations. Know what an adverse inference instruction is and how it was applied in Zubulake versus what must be established under FRCP Rule 37€ after 2015. Know what Judge Scheindlin found to be a litigant’s and counsel’s duties with respect to preservation. Seven-point analytical frameworks (as for cost-shifting) make good test fodder.
  • Williams v. Sprint: (Judge Waxse, Kansas). Williams is a seminal decision respecting metadata. In Williams v. Sprint, the matter concerned purging of metadata and the locking of cells in spreadsheets in the context of an age discrimination action after a reduction-in-force. Judge Waxse applied Sedona Principle 12 in its earliest (and now twice revised) form. What should Sprint have done?  Did the Court sanction any party? Why or why not?
  • Rodman v. Safeway: (Judge Tigar, ND California) This case, like Zubulake IV, looks at the duties and responsibilities of counsel when monitoring a client’s search for and production of potentially responsive ESI? What is Rule 26(g), and what does it require? What constitutes a reasonable search? To what extent and under what circumstances may counsel rely upon a client’s actions and representations in preserving or collecting responsive ESI?
  • Columbia Pictures v. Bunnell: (Judge Chooljian, California) What prompted the Court to require the preservation of such fleeting, ephemeral information? Why were the defendants deemed to have control of the ephemeral data? Unique to its facts?
  • In re NTL, Inc. Securities Litigation: (Judge Peck, New York) Be prepared to discuss what constitutes control for purposes of imposing a duty to preserve and produce ESI in discovery and how it played out in this case. I want you to appreciate that, while a party may not be obliged to succeed in compelling the preservation or production of relevant information beyond its care, custody or control, a party is obliged to exercise all such control as the party actually possesses, whether as a matter of right or by course of dealing. What’s does The Sedona Conference think about that?
  • William A. Gross Constr. Assocs., Inc. v. Am. Mfrs. Mut. Ins. Co.: (Judge Peck, New York) What was the “wake up call,” who were expected to awaken and on what topics?
  • Adams v. Dell: (Judge Nuffer, Utah) What data was claimed to have been lost? What was supposed to have triggered the duty to preserve? What did the Court say about a responding party’s duty, particularly in designing its information systems? Outlier?
  • RAMBUS: (Judge Whyte, California) I expect you to know what happened and to appreciate that the mere reasonable anticipation of litigation–especially by the party who brings the action–triggers the common law duty to preserve. Be prepared to address the sorts of situations that might or might not trigger a duty to initiate a legal hold.
  • United States v. O’Keefe (Judge Facciola, DC): I like this case for its artful language (Where do angels fear to tread?) and consideration of the limits and challenges of keyword search.  The last being a topic that bears scrutiny wherever it has been addressed in the material.  That is, does keyword search work as well as lawyers think, and how can we improve upon it and compensate for its shortcomings? 
  • Victor Stanley v. Creative Pipe I & II (Judge Grimm, Maryland):  Read VS I with an eye toward understanding the circumstances when inadvertent production triggers waiver (pre-FRE 502).  What are the three standards applied to claims of waiver?  What needs to be in the record to secure relief?

    Don’t get caught up in the prolonged factual minutiae of VS II.  Read VS II to appreciate the varying standards that once existed across the Circuits for imposition of spoliation sanctions and that pre-date the latest FRCP Rules amendments, i.e., Rule 37(e).
  • Anderson Living Trust v. WPX Energy Production, LLC (Judge Browning, New Mexico): This case looks at the application and intricacies of FRCP Rule 34 when it comes to ESI versus documents.  My views about the case were set out in the article you read called “Breaking Badly.”
  • In re: State Farm Lloyds (Texas Supreme Court):  Proportionality is the buzzword here; but does the Court elevate proportionality to the point of being a costly hurdle serving to complicate a simple issue?  What does this case portend for Texas litigants in terms of new hoops to jump over issues as straightforward as forms of production?  What role did the Court’s confusion about forms (and a scanty record) play in the outcome?
  • Monique Da Silva Moore, et al. v. Publicis Groupe & MSL Group and Rio Tinto Plc v. Vale S.A., (Judge Peck, New York): DaSilva Moore is the first federal decision to approve the use of the form of Technology Assisted Review (TAR) called Predictive Coding as an alternative to linear, manual review of potentially responsive ESI.  Rio Tinto is Judge Peck’s follow up, re-affirming the viability of the technology without establishing an “approved” methodology.
  • Brookshire Bros. v. Aldridge (Texas Supreme Court): This case sets out the Texas law respecting spoliation of ESI…or does it?  Is the outcome and “analysis” here consistent with the other preservation and sanctions cases we’ve covered?
  • VanZant v. Pyle (Judge Sweet, New York): Issues of control and spoliation drive this decision.  Does the Court correctly apply Rule 37(e)?
  • CAT3 v. Black Lineage (Judge Francis, New York):This trademark infringement dispute concerned an apparently altered email.  Judge Francis found the alteration sufficient to support sanctions under Rule 37(e).  How did he get there?  Judge Francis also addressed the continuing viability of discretionary sanctions despite 37(e).  What did he say about that?
  • EPAC v. Thos. Nelson, Inc.: Read this report closely to appreciate how the amended Rules, case law and good practice serve to guide the court in fashioning remedial measures and punitive sanctions.  Consider the matter from the standpoint of the preservation obligation (triggers and measures) and from the standpoint of proportionate remedial measures and sanctions.  What did the Special Master do wrong here?
  • Mancia v. Mayflower (Judge Grimm, Maryland): Don’t overlook this little gem in terms of its emphasis on counsel’s duties under FRCP Rule 26(g).  What are those duties?  What do they signify for e-discovery? What is the role of cooperation in an adversarial system?
  • Race Tires America, Inc. v. Hoosier Racing Tire Corp. (Judge Vanaskie, Pennsylvania): This opinion cogently defines the language and limits of 28 U.S.C. §1920 as it relates to the assessment of e-discovery expenses as “taxable costs.”  What common e-discovery expenses might you seek to characterize as costs recoverable under §1920, and how would you make your case?
  • Zoch v. Daimler (Judge Mazzant, Texas):  Did the Court correctly resolve the cross-border and blocking statute issues?  Would the Court’s analysis withstand appellate scrutiny once post-GDPR? 

Remember: bits, bytes, sectors, clusters (allocated and unallocated), tracks, slack space, file systems and file tables, why deleted doesn’t mean gone, forensic imaging, forensic recovery techniques like file carving, EXIF data, geolocation, file headers/binary signatures, hashing, normalization, de-NISTing, deduplication and file shares.  For example: you should know that an old 3.5” floppy disk typically held no more than 1.44MB of data, whereas the capacity of a new hard drive or modern backup tape would be measured in terabytes. You should also know the relative capacities indicated by kilobytes, megabytes, gigabytes, terabytes and petabytes of data (i.e., their order of ascendancy, and the fact that each is 1,000 times more or less than the next or previous tier).  Naturally, I don’t expect you to know the tape chronology/capacities, ASCII/hex equivalencies or other ridiculous-to-remember stuff.

4. TERMINOLOGY: Lawyers, more than most, should appreciate the power of precise language.  When dealing with professionals in technical disciplines, it’s important to call things by their right name and recognize that terms of art in one context don’t necessarily mean the same thing in another.  When terms have been defined in the readings or lectures, I expect you to know what those terms mean.  For example, you should know what ESI, EDRM, RAID, system and application metadata (definitely get your arms firmly around application vs. system metadata), retention, purge and rotation mean (e.g., grandfather-father-son rotation); as well as Exchange, O365, 26(f), 502(d), normalization, recursion, native, near-native, TIFF+, load file, horizontal, global and vertical deduplication, IP addressing, data biopsy, forensically sound, productivity files, binary signatures and file carving, double deletion, load files, delimiters, slack space, unallocated clusters, UTC offset, proportionality, taxable costs, sampling, testing, iteration, TAR, predictive coding, recall, precision, UTC, VTL, SQL, etc.

5. ELECTRONIC DISCOVERY REFERENCE MODEL:  We’ve returned to the EDRM many times as we’ve moved from left to right across the iconic schematic.  Know it’s stages, their order and what those stages and triangles signify.

6. ENCODING: You should have a firm grasp of the concept of encoded information, appreciating that all digital data is stored as numbers notated as an unbroken sequence of 1s and 0s. How is that miracle possible? You should be comfortable with the concepts described in pp. 132-148 of the Workbook (and our class discussions of the fact that the various bases are just ways to express numbers of identical values in different notations). You should be old friends with the nature and purpose of, e.g., base 2 (binary), base 10 (decimal) base 16 (hexadecimal), base 64 (attachment encoding), ASCII and UNICODE.

7 STORAGE: You should have a working knowledge of the principal types and capacities of common electromagnetic and solid-state storage devices and media (because data volume has a direct relationship to cost of processing and time to review in e-discovery). You should be able to recognize and differentiate between, e.g., floppy disks, thumb drives, optical media, hard drives, solid state storage devices, RAID arrays and backup tape, including a general awareness of how much data they hold. Much of this is in pp. 22-48 of the Workbook (Introduction to Data Storage Media).  For ready reference and review, I’ve added an appendix to this study guide called, “Twenty-One Key Concepts for Electronically Stored Information.”

8. E-MAIL: E-mail remains the epicenter of corporate e-discovery; so, understanding e-mail systems, forms and the underlying structure of a message is important.  The e-mail chapter should be reviewed carefully.  I wouldn’t expect you to know file paths to messages or e-mail forensics, but the anatomy of an e-mail is something we’ve covered in detail through readings and exercises.  Likewise, the messaging protocols (POP, MAPI, IMAP, WEB, MIME, etc.), mail single message and container formats (PST, OST, EDB, NSF, EML, MSG, DBX, MHTML, MBOX) and leading enterprise mail client-server pairings (Exchange/Outlook, Domino/Notes, O365/browser) are worth remembering.  Don’t worry, you won’t be expected to extract epoch times from boundaries again. 😉

9. FORMS: Forms of production loom large in our curriculum.  Being that everything boils down to just an unbroken string of ones-and-zeroes, the native forms and the forms in which we elect to request and produce them (native, near-native, images (TIFF+ and PDF), paper) play a crucial role in all the “itys” of e-discovery: affordability, utility, intelligibility, searchability and authenticability.  What are the purposes and common structures of load files?  What are the pros and cons of the various forms of production?  Does one size fit all?  How does the selection of forms play out procedurally in federal and Texas state practice?  How do we deal with Bates numbering and redaction?  Is native and near-native production better and, if so, how do we argue the merits of native production to someone wedded to TIFF images?  This is HUGE in my book!  There WILL be at least one essay question on this and likely several other test questions.

10. SEARCH AND REVIEW: We spent a fair amount of time talking about and doing exercises on search and review.  You should understand the various established and emerging approaches to search: e.g., keyword search, Boolean search, fuzzy search, stemming, clustering, predictive coding and Technology Assisted Review (TAR).  Why is an iterative approach to search useful, and what difference does it make?  What are the roles of testing, sampling and cooperation in fashioning search protocols?  How do we measure the efficacy of search?  Hint: You should know how to calculate recall and precision and know the ‘splendid steps’ to take to improve the effectiveness and efficiency of keyword search (i.e., better F1 scores). 

You should know what a review tool does and customary features of a review platform.  You should know the high points of the Blair and Maron study (you read and heard about it multiple times, so you need not read the study itself).  Please also take care to understand the limitations on search highlighted in your readings and those termed The Streetlight Effect.

11.ACCESSIBILITY AND GOOD CAUSE: Understand the two-tiered analysis required by FRCP Rule 26(b)(2)(B).  When does the burden of proof shift, and what shifts it?  What tools (a/k/a conditions) are available to the Court to protect competing interests of the parties

12. FRE RULE 502: It’s your friend!  Learn it, love it, live it (or at least know when and how to use it).  What protection does it afford against subject matter waiver?  Is there anything like it in state practice?  Does it apply to all legally cognized privileges?

13. 2006 AND 2015 RULES AMENDMENTS: You should understand what they changed with respect to e-discovery.  Concentrate on proportionality and scope of discovery under Rule 26, along with standards for sanctions under new Rule 37(e).  What are the Rule 26 proportionality factors?  What are the findings required to obtain remedial action versus serious sanctions for spoliation of ESI under 37(e)?  Remember “intent to deprive.”

14. MULTIPLE CHOICE: When I craft multiple choice questions, there will typically be two answers you can quickly discard, then two you can’t distinguish without knowing the material. So, if you don’t know an answer, you increase your odds of doing well by eliminating the clunkers and guessing. I don’t deduct for wrong answers.  Read carefully to not whether the question seeks the exception or the rule. READ ALL ANSWERS before selecting the best one(s) as I often include an “all of the above” or “none of the above” option.

15. All lectures and reviews of exercises are recorded and online for your review, if desired.

16. In past exams, I used the following essay questions.  These will not be essay questions on your final exam; however, I furnish them here as examples of the scope and nature of prior essay questions:

EXAMPLE QUESTION A: On behalf of a class of homeowners, you sue a large bank for alleged misconduct in connection with mortgage lending and foreclosures. You and the bank’s counsel agree upon a set of twenty Boolean and proximity queries including:

  • fnma AND deed-in-lieu
  • 1/1/2009 W/4 foreclos!
  • Resumé AND loan officer
  • (Problem W/2 years) AND HARP

These are to be run against an index of ten loan officers’ e-mail (with attached spreadsheets, scanned loan applications, faxed appraisals and common productivity files) comprising approximately 540,000 messages and attachments).  Considering the index search problems discussed in class and in your reading called “The Streetlight Effect in E-Discovery,” identify at least three capabilities or limitations of the index and search tool that should be determined to gauge the likely effectiveness of the contemplated searches.  Be sure to explain why each matter. 

I am not asking you to assess or amend the agreed-upon queries.  I am asking what needs to be known about the index and search tool to ascertain if the queries will work as expected.

EXAMPLE QUESTION B: The article, A Bill of Rights for E-Discovery included the following passage:

I am a requesting party in discovery.

I have duties.

I am obliged to: …

Work cooperatively with the producing party to identify reasonable and effective means to reduce the cost and burden of discovery, including, as appropriate, the use of tiering, sampling, testing and iterative techniques, along with alternatives to manual review and keyword search.

Describe how “tiering, sampling, testing and iterative techniques, along with alternatives to manual review and keyword search” serve to reduce the cost and burden of e-discovery.  Be sure to make clear what each term means.

It’s been an excellent semester and a pleasure for me to have had the chance to work with a bright bunch.  Thank you for your effort!  I’ve greatly enjoyed getting to know you notwithstanding the limits imposed by the pandemic and Mother Nature’s icy wrath.  I wish you the absolute best on the exam and in your splendid careers to come.  Count me as a future resource to call on if I can be of help to you.  Best of Luck!   Craig Ball


Twenty-One Key Concepts for Electronically Stored Information

  1. Common law imposes a duty to preserve potentially relevant information in anticipation of litigation.
  2. Most information is electronically stored information (ESI).
  3. Understanding ESI entails knowledge of information storage media, encodings and formats.
  4. There are many types of e-storage media of differing capacities, form factors and formats:
    a) analog (phonograph record) or digital (hard drive, thumb drive, optical media).
    b) mechanical (electromagnetic hard drive, tape, etc.) or solid-state (thumb drive, SIM card, etc.).
  5. Computers don’t store “text,” “documents,” “pictures,” “sounds.” They only store bits (ones or zeroes).
  6. Digital information is encoded as numbers by applying various encoding schemes:
    a) ASCII or Unicode for alphanumeric characters.
    b) JPG for photos, DOCX for Word files, MP3 for sound files, etc.
  7. We express these numbers in a base or radix (base 2 binary, 10 decimal, 16 hexadecimal, 60 sexagesimal). E-mail messages encode attachments in base 64.
  8. The bigger the base, the smaller the space required to notate and convey the information.
  9. Digitally encoded information is stored (written):
    a) physically as bytes (8-bit blocks) in sectors and partitions.
    b) logically as clusters, files, folders and volumes.
  10. Files use binary header signatures to identify file formats (type and structure) of data.
  11. Operating systems use file systems to group information as files and manage filenames and metadata.
  12. Windows file systems employ filename extensions (e.g., .txt, .jpg, .exe) to flag formats.
  13. All ESI includes a component of metadata (data about data) even if no more than needed to locate it.
  14. A file’s metadata may be greater in volume or utility than the contents of the file it describes.
  15. File tables hold system metadata about the file (e.g., name, locations on disk, MAC dates): it’s CONTEXT.
  16. Files hold application metadata (e.g., EXIF geolocation data in photos, comments in docs): it’s CONTENT.
  17. File systems allocate clusters for file storage, deleting files releases cluster allocations for reuse.
  18. If unallocated clusters aren’t reused, deleted files may be recovered (“carved”) via computer forensics.
  19. Forensic (“bitstream”) imaging is a method to preserve both allocated and unallocated clusters.
  20. Data are numbers, so data can be digitally “fingerprinted” using one-way hash algorithms (MD5, SHA1).
  21. Hashing facilitates identification, deduplication and de-NISTing of ESI in e-discovery.

The Great Pandemic Leap

Much has been made of the “Great Pandemic Leap” by law firms and courts. Pandemic proved to be, if not the mother of invention, at least the mother****** who FINALLY got techno tardy lawyers to shuffle forward. The alleged leap had nothing to do with new technology. Zoom and other collaboration tools have been around a long time. In fact, April 21, 2021 was Zoom’s 10th Birthday! Happy Birthday, Zoom! Thanks for being there for us.

No, it wasn’t new technology. The ‘Ten Years in Ten Weeks’ great leap was enabled by compulsion, adoption and support.

“Compulsion” because we couldn’t meet face-to-face, and seeing faces (and slides and white boards) is important.
“Adoption” because so many embraced Zoom and its ilk that we suddenly enjoyed a common meeting place.
“Support” because getting firms and families up and running on Zoom et al. became a transcendent priority.

It didn’t hurt that schools moving to Zoom served to put a support scion in many lawyers’ homes and, let’s face it Atticus, the learning curve wasn’t all that steep. Everyone already had a device with camera and microphone. Zoom made it one-click easy to join a meeting, even if eye-level camera positioning and unmuting of microphones has proven more confounding to lawyers than the Rule Against Perpetuities.

For me, the Great Leap manifested as the near-universal ability to convene on a platform where screen sharing and remote control were simple. I’ve long depended on remote control and screen sharing tools to access machines by Remote Desktop Protocol (RDP) or TeamViewer (not to mention PCAnywhere and legacy applications that made WFH possible in the 90s and aughts). But, that was on my own machines. Linking to somebody else’s machine without a tech-savvy soul on the opposite end was a nightmare. If you’ve ever tried to remotely support a parent, you understand. “No, Mom, please don’t click anything until I tell you. Oh, you already did? What did the error message say? Next time, don’t hit ‘Okay” until you read the message, please Mom.

E-discovery and digital forensics require defensible data identification, preservation and collection. The pandemic made deskside reviews and onsite collection virtually impossible, or more accurately, those tasks became possible only virtually. Suddenly, miraculously, everyone knew how to join a Zoom call, so custodians could share screens and hand over remote control of keyboard and mouse. I could record the sessions to document the work and remotely load software (like iMazing or CoolMuster) to preserve and access mobile devices. Remote control and screen sharing let me target collection efforts based on my judgment and not be left at the mercy of a custodian’s self-interested actions. Custodians could observe, assist and intervene in my work or they could opt to walk away and leave me to do my thing. I was “there,” but less intrusively and spared the expense and hassle of travel. I could meet FRCP 26(g) obligations and make a record to return to if an unforeseen issue arose.

In my role as investigator, there’s are advantages attendant to being onsite; e.g., I sometimes spot evidence of undisclosed data sources. But, weighed against the convenience and economy of remote identification and collection, I can confidently say I’m never going back to the old normal when I can do the work as well via Zoom.

Working remotely as I’ve described requires a passing familiarity with Zoom screen sharing, if only to be able to talk others through unseen menus. As Zoom host, you will need to extend screen sharing privileges to the remote user. Do this on-the-fly by making the remote user a meeting co-host, (click “More” alongside their name in the Participants screen). Alternatively, you can select Advanced Sharing Options from the Share Screen menu. Under “Who can Share?” choose “All Participants.”

To acquire control of the remote user’s mouse and keyboard, have the remote user initiate a screen share then open the View Options dropdown menu alongside the green bar indicating you’re viewing a shared screen. Select “Request Remote Control,” then click “Request” to confirm. The remote user will see a message box seeking authorization to control their screen. Once authorized, click inside the shared screen window to take control of the remote machine.

If you need to inspect a remote user’s iPhone or iPad, Zoom supports sharing those devices using a free plugin that links the mobile device over the same WiFi connection as the Zoom session. To initiate an iPhone/iPad screen share, instruct the remote user to click Screen Share and then select the iPhone/iPad icon at right for further instructions. Simpler still, have the remote user install Zoom on the phone or pad under scrutiny and join the Zoom session from the mobile device. Once in the meeting, the remote user screen shares from the session on the mobile device. Easy-peasy AND it works for Android phones, too!

So Counselor, go ahead and take that victory lap. Whether you made a great leap or were dragged kicking and screaming to a soupçon of technical proficiency, it’s great to see you! Hang onto those gains, and seek new ways to leverage technology in your practice. Your life may no longer depend on it, but your future certainly does.

Life Lessons from E-Discovery

Eight years ago, my old friends and Über-thought leaders Bill Hamilton and George Socha created an e-discovery conference targeting an underserved constituency: lawyers without the luxury of an e-discovery practice group or litigation support staff. Regular folks. The always enlightening and enjoyable University of Florida E-Discovery Conference has been a fixture on my speaking calendar for years. This year, the pandemic foreclosed the customary face-to-face confab in central Florida, so we convened virtually– just Bill, George, me and 3,000 of our closest friends. Seriously, the turnout was astounding: 3,058 unique attendees! BRAVO!

My contribution was modest–fifteen minutes chatting about Life Lessons from E-Discovery. Here’s what I shared:

Thirty years ago, Robert Fulgham published a bestseller called, “All I Really Need To Know I Learned In Kindergarten.”  It posited that the simple lessons we gained as children can guide us all our lives.

The lessons were things like:

  • Share everything.
  • Play fair.
  • Don’t hit people.
  • Put things back where you found them.
  • Clean up your own mess.
  • Don’t take things that aren’t yours.
  • Say you’re sorry when you hurt somebody.
  • Flush and wash your hands.

That last one proved especially useful of late!

Fulgham’s point was that childish precepts extrapolate well to our adult lives, to relationships, business, government, really to everything.

I’ve been a student and teacher of electronic evidence for forty years, so when Professor Hamilton asked me to say a few words today, I wondered what I’d gleaned from electronic discovery that might yield life lessons like those kindergarten rules.  Many things came to mind.  Things like:

They all say basically the same thing: treat others with respect and courtesy.  I commend them all to you, but the shameful truth is I’ve violated enough of those precepts that I feel unworthy to preach their indisputable value.

Instead, I sought five precepts uniquely suited to e-discovery, five lessons I’ve acquired and come to believe in through hard experience.

I should confess that my point of view is a jaundiced and cynical one.  As a special master, Courts bring me in when discovery’s gone off the rails, often when sanctions are in the offing.  In my world, incompetence and deceit are the norm.  So, if my lessons strike you as too obvious or too simple, I’m thrilled to hear it.

The first rule, and really the most fundamental is:

Tell the truth based on fact. 

Albert Einstein said, “Imagination is more important than information.”  Sorry, Al, not in e-discovery.

When it comes to e-discovery, information is more important than imagination.  In e-discovery, information is everythingMeasurement trumps opinion.  Your gut sense that the other side is withholding evidence is fascinating, but it’s not proof.  Your certainty that the client has no responsive data is just baloney without a competent search. 

If we are to be credible professionals, We must concede what we don’t know, share what we do know and recognize that cooperation isn’t a hallmark of weakness but a harbinger of strength.  Bluffing is fine at the poker table, but it will kill you in Court.  Your word—your credibility—your reputation for honesty is worth more than all your education and skill.

And a variant on number one is:

Tell the truth, no matter the consequences.

I’ve written hundreds of articles about e-discovery and forensics.  Colleagues ask, “Aren’t you afraid something you wrote will be used against you in cross-examination?”  I tell them I’ve never worried about that because I’ve told the truth as I knew it in everything I wrote.  Sometimes I was mistaken, but I was never false.  So, I don’t have to remember what I said.  I just hold on to what I know to be true.

If someone wants to cite me to impeach me, bring it on!  I’ll take them from punched cards to magnetic media to solid state storage, from big iron mainframes to client-server to the Cloud.  I’ll share my conviction that learning never ends, and, yes, mistakes happen along the way.  The measure that matters is how we own our errors.  If we stick to the truth, we can gain more from failure than success.

My second precept is just one word.  A century ago, IBM’s founder Thomas J. Watson put a word on an easel at a business meeting.  It read “THINK.”  That’s still IBM’s slogan, and it’s what I want to shout at lawyers who serve ridiculous requests for production or file boilerplate objections.

THINK!  I want to stamp it on the foreheads of lawyers who just don’t think about where evidence is likely to be found or sensible ways to find it.  I know lawyers to be first-class thinkers; so, it’s maddening when good lawyers take off their thinking caps in e-discovery.  Any lawyer can learn enough tech to master the “E” in e-discovery.  Anyone.  All of us on this conference faculty are convinced of it.  It’s what gets us out of bed each day.

But to do it, lawyers must cast aside doubt and turn off the parts of their brains that tell them they’re too old, too busy or just too much a lawyer to learn something new.  Conferences like this one help—thank you for being here–but it takes more than a few hours on Zoom or a big litigation budget to become competent to serve your clients in the realm of electronic evidence.  It requires a willingness to fearlessly embrace an unfamiliar discipline–to learn a second thing. 

It takes a commitment to study, question, pursue and explore information about information and a commitment to THINK, THINK, THINK about how people communicate, what tools and software they use, their language, what metadata matters and where data lives. So, please don’t think you can’t learn it, or worse, that you need not learn it.  You can and you must.  Nothing less than the future of the civil justice system depends on it.  Of course, you’re here pursuing greater expertise, so I’m preaching to the choir.

My third lesson is: Have a plan.

In my thirties, I read Robert Caro’s epic biography of Robert Moses.  Moses was an urban planner who reshaped New York.  Robert Moses’ massive projects got built.  The secret to his success was that, where others came to planning meetings with ideas, Robert Moses arrived with blueprints and budgets.  He was a man with a plan.

In e-discovery, lawyers are brilliant at articulating objections, at saying what clients won’t do; but what’s often missing is a well-reasoned plan for what clients will do.  When you come with a plan, it’s clear you thought about what must be done.  A practical plan demonstrates a commitment to progress.  A reasonable plan forces the other side to work within your framework.  Judges love it when lawyers have a plan.  The Rules of discovery are written to better serve litigants with a plan. 

The e-discovery plan is a protocol. E-discovery demands a good protocol and success in e-discovery requires that lawyers know which features of a protocol are crucial and which are negotiable.  So, always show up with a plan.

Number four is: Never attribute to guile that which can be explained by incompetence.

I borrowed and adapted this one from my late friend, Browning Marean, who had a huge store of wise sayings.  In fairness, Browning borrowed it from Robert Heinlein, whose “Heinlein’s Razor” reads, “never attribute to malice that which is adequately explained by stupidity.”  Because we know lawyers aren’t stupid, I prefer to term it a shortfall in competence.

When a party messes up in e-discovery, the victims of failure often cry “foul” and suspect an intent to deprive them of the evidence.  In my experience, intent,–I’m calling it “guile,”– when it’s genuine, tends to manifest as efforts to conceal the screw up—it’s the cover up that kills you, not the failure itself.  Most screw ups are just… screw ups.  Always avoidable, sometimes reprehensible, but more often the result of apathy than antipathy. Maybe that’s why the last set of Rules amendments shielded parties from serious sanctions for mere incompetence.  In my mind, the decision to tie judges’ hands when disciplining incompetence and spoliation was a poor one.  A mistakenly political one.  Fear of sanctions was the prime driver of the e-discovery revolution.  It was the reason lawyers and companies came around and started preserving and producing ESI.  Sanctions were the driver of competence.  Sadly, we never had much of a carrot, and now they’ve taken away the stick.

And my final lesson is one of human nature:

Remember that Courts guard their authority more scrupulously than your client’s rights.

What I mean by this, with no disrespect to the judges on this faculty or listening, is a recognition born of experience, that a party is considerably more likely to be disciplined for violating a court order than for failing to fulfill obligations to an opponent. The takeaway is that an effort to secure sanctions is a marathon, not a sprint. 

You must take the time and make the effort to mature motions to compel or for protection into explicit orders of the Court.  A court cannot function if its orders are ignored with impunity.  So, if sanctions are your objective, position the failure to produce to be more than simply a transgression of your client’s rights, put it in the posture of something that threatens the court’s sovereignty.

And while we’re talking sanctions, never forget that sanctions are exceptional remedies.  Courts hate to sanction parties or counsel.  Though the threat of sanctions carried along in a case can be a useful tactic, seeking sanctions to patch a weak case is a fool’s errand.  Discovery is a mechanism to gather evidence to make your case, nothing more or less than that.

Those are my five.  I expect you have some great ones of your own.  Next year, I’d like to hear you sharing yours right here.  Even better, let’s all meet in Gainesville. Share our ideas.  Break bread and toast a return to normalcy.   That’s an invitation to join the e-discovery community.  Being part of it has been one of the great delights of my professional life. I’ve made wonderful friends that way.  You will, too.

Be well and thank you.

Can a Producing Party Refuse to Produce Linked Attachments to E-Mail?

A fellow professor of e-discovery started my morning with a question. He wrote, “In companies using Google business, internal email ‘attachments’ are often linked with a URL to documents on a Google drive rather than actually ‘attached’ to the email…. Can the producing party legally refuse to produce the document as an attachment to the email showing the family? Other links in the email to, for example, a website need not be produced.

I replied that I didn’t have the definitive answer, but I had a considered opinion. First, I challenged the assertion, “Other links in the email to, for example, a website need not be produced.”

Typically, the link must be produced because it’s part of the relevant and responsive, non-privileged message.  But, the link is just a pointer, a path to an item, and the discoverability of the link’s target hinges upon whether the (non-privileged) target is responsive AND within the care, custody or subject to the control of the producing party.

For the hypothetical case, I assume that the transmittal is deemed relevant and the linked targets are either relevant by virtue of their being linked to the transmittal or independently relevant and responsive.  I also assume that the linked target remains in the care, custody or subject to the control of the producing party because it has a legal and practical right of access to the repository where the linked target resides; that is, the producing party CAN access the linked item, even if they would rather not retrieve the relevant, responsive and non-privileged content to which the custodian has linked the transmittal.

If the link is not broken and the custodian of the message could click the link and access the linked target, where is the undue burden and cost?  Certainly I well know that collection is often delegated to persons other than the custodian, but shouldn’t we measure undue burden and cost from the standpoint of the custodian under the legal duty to preserve and produce, NOT from the perspective of a proxy engaged to collect, but lacking the custodian’s ability to collect, the linked target? Viewed in this light, I don’t see where the law excuses the producing party from collecting and producing the linked target

The difficulty in collection cited results from the producing party contracting to delegate storage to a third-party Cloud Provider, linking to information relegated to the Cloud Provider’s custody.  In certain respects, it’s like the defendant in Columbia Pictures v. Bunnell, who put a contractor (Panther) in control of the IP addresses of the persons trading pirated movies via the defendant’s platform.  Just because you enlist someone to keep your data on your behalf doesn’t defeat your ultimate right of control or your duty of production. 

Having addressed duty, let’s turn to feasibility, which is really what the fight’s about. 

Two key issues I see are: 

1. What if the link is broken by the passage of time?  If the target cannot be collected after reasonable efforts to do so, then it may be infeasible to effect the collection via the link or via pairing the link address to the addresses of the contents of the repository (as by using the Download-Link-Generator tool I highlighted here).  If there is simply no way a link created before a legal hold duty attached can be tied to its target, then you can’t do it, and the Court can’t order the impossible.  But, you can’t just label something “impossible” because you’d rather not do it.  You must make reasonable efforts and you must prove infeasibility. Courts should look askance at claims of infeasibility asserted by producing parties who have created the very situations that make it harder to obtain discovery.

2. What if the content of the target has changed since the time it was linked?  This is where the debate gets stickier, and where I have little empathy for a producing party who expects to be excused from production on the basis that it altered the evidence.  If the evidence has changed to the point where its relevance is in question because it may have been materially changed after linking, then the burden to prove the material change (and diminished relevance) falls on the producing party, not the requesting party.  Else, you take your evidence as you find it, and you produce it as it exists at the time of preservation and collection.  The possibility that it changed goes to its admissibility and weight, not to its discoverability.

I hope you agree my analysis is sound. To paraphrase Abraham Lincoln, you cannot murder your parents and then seek leniency because you’re an orphan. The problem is solvable, but it will be resolved only when Courts supply the necessary incentive by ordering collection and production. Integrating a hash value of the target within the link might go a long way to curing this Humpty-Dumpty dilemma; then, the target can be readily identified AND proven to be in the same state as when the link was created.

While we are at it, embedded links should be addressed from the standpoint of security and ethics. If a producing party supplies a message or document with a live link and opposing counsel’s clicks on the link exposing information not meant to be produced, whose head should roll there? If a party produces a live link in an email, is it reasonable to assume that the target was delivered, too? To my mind, the link is fair game, just as the attachment would be had it been embedded in the message. Electronic delivery is delivery. We have rules governing inadvertent production of privileged content, but not for the scenario described.

Don’t BE a Tool, GET a Tool!

Considering the billions of dollars spent on e-discovery every year, wouldn’t you think every trial lawyer would have some sort of e-discovery platform?  Granted, the largest firms have tools; in fact, e-discovery software provider Relativity (lately valued at $3.6 billion) claims 198 of the 200 largest U.S. law firms as its customers.  But, for the smaller firms and solo practitioners who account for 80% or more of lawyers in private practice, access to e-discovery tools falls off.  Off a cliff, that is. 

When law firms or solos seek my help obtaining native production, my first question is often, “what platform are you using?”  Their answer is usually “PC” or simply a blank stare.  When I add, “your e-discovery platform–the software tool you’ll use to review and search electronically stored information,” the dead air makes clear they haven’t a clue.  I might as well ask a dog where it will drive if it catches the car.    

Let’s be clear: no lawyer should expect to complete an ESI review of native forms using native applications

Don’t do it.

I don’t care how many regale me with tales of their triumphs using Outlook or Microsoft Word as ‘review tools.’  That’s not how it’s done.  It’s reckless.  The integrity of electronic evidence will be compromised by that workflow.  You will change hash values.  You will alter metadata.  Your searches will be spotty.  Worst case scenario: your copy of Outlook could start spewing read receipts and calendar reminders.  I dare you to dig your way out of that with a smile.  Apart from the risks, review will be slow.  You won’t be able to tag or categorize data.   When you print messages, they’ll bear your name instead of the custodian’s name. Doh!

None of this is an argument against native production. 
It’s an argument against incompetence. 

I am as dedicated a proponent of native production as you’ll find; but to reap the benefits and huge cost savings of native production, you must use purpose-built review tools.  Notwithstanding your best efforts to air gap computers and use working copies, something will fail.  Just don’t do it.

You’ll also want to use an e-discovery review tool because nothing else will serve to graft the contents of load files onto native evidence.  For the uninitiated, load files are ancillary, delimited text files supplied with a production and used to carry information about the items produced and the layout of the production.

I know some claim that native productions do away with the need for load files, and I concede there are ways to structure native productions to convey some of the data we now exchange via load files.  But why bother?  After years in the trenches, I’ve given up cursing the use of load files in native, hybrid and TIFF+ productions.  Load files are clunky, but they’re a proven way to transmit filenames and paths, supply Bates numbers, track duplicates, share hash values, flag family relationships, identify custodians and convey system metadata (that’s the kind not stored in files but residing in the host system’s file table).   Until there’s a better mousetrap, we’re stuck with load files.

The takeaway is get a tool. If you’re new to e-discovery, you need to decide what e-discovery tool you will use to review ESI and integrate load files.  Certainly, no producing party can expect to get by without proper tools to process, cull, index, deduplicate, search, review, tag and export electronic evidence—and to generate load files.  But requesting parties, too, are well-served to settle on an e-discovery platform before they serve their first Request for Production.  Knowing the review tool you’ll use informs the whole process, particularly when specifying the forms of production and the composition of load files.  Knowing the tool also impacts the keywords used in and structure of search queries.

There are a ton of tools out there, and one or two might not skin you alive on price. Kick some tires. Ask for a test drive. Shop around. Do the math. But, figure out what you’re going to do before you catch that car. Oh, and don’t even THINK about using Outlook and Word. I mean it. I’ve got my eye on you, McFly.

Understanding the UPC: Because You Can

Where does the average person encounter binary data?  Though we daily confront a deluge of digital information, it’s all slickly packaged to spare us the bare binary bones of modern information technology.  All, that is, save the humble Universal Product Code, the bar code symbology on every packaged product we purchase from a 70-inch TV to a box of Pop Tarts.  Bar codes and their smarter Japanese cousins, QR Codes, are perhaps the most unvarnished example of binary encoding in our lives. 

Barcodes have an ancient tie to e-discovery as they were once used to Bates label hard copy documents, linking them to “objective coding” databases. A lawyer using barcoded documents was pretty hot stuff back in the day.

Just a dozen numeric characters are encoded by the ninety-five stripes of a UPC-A barcode, but those digits are encoded so ingeniously as to make them error resistant and virtually tamperproof. The black and white stripes of a UPC are the ones and zeroes of binary encoding.  Each number is encoded as seven bars and spaces (12×7=84 bars and spaces) and an additional eleven bars and spaces denote start, middle and end of the UPC.  The start and end markers are each encoded as bar-space-bar and the middle is always space-bar-space-bar-space.  Numbers in a bar code are encoded by the width of the bar or space, from one to four units. 

This image has an empty alt attribute; its file name is barcode-water.png

The bottle of Great Value purified water beside me sports the bar code at right.

Humans can read the numbers along the bottom, but the checkout scanner cannot; the scanner reads the bars. Before we delve into what the numbers signify in the transaction, let’s probe how the barcode embodies the numbers.  Here, I describe a bar code format called UPC-A.  It’s a one-dimensional code because it’s read across.  Other bar codes (e.g., QR codes) are two-dimensional codes and store more information because they use a matrix that’s read side-to-side and top-to-bottom.

The first two black bars on each end of the barcode signal the start and end of the sequence (bar-space-bar).  They also serve to establish the baseline width of a single bar to serve as a touchstone for measurement.  Bar codes must be scalable for different packaging, so the ability to change the size of the codes hinges on the ability to establish the scale of a single bar before reading the code.

Each of the ten decimal digits of the UPC are encoded using seven “bar width” units per the schema in the table at right.

To convey the decimal string 078742, the encoded sequence is 3211 1312 1213 1312 1132 2122 where each number in the encoding is the width of the bars or spaces.  So, for the leading value “zero,” the number is encoded as seven consecutive units divided into bars of varying widths: a bar three units wide, then (denoted by the change in color from white to black or vice-versa), a bar two units wide, then one then one.  Do you see it? Once more, left-to-right, a white band, three units wide, a dark band two units wide , then a single white band and a single dark band (3-2-1-1 encoding the decimal value zero).

You could recast the encoding in ones and zeroes, where a black bar is a one and a white bar a zero. If you did, the first digit would be 0001101, the number seven would be 0111011 and so on; but there’s no need for that, because the bands of light and dark are far easier to read with a beam of light than a string of printed characters.

Taking a closer look at the first six digits of my water bottle’s UPC, I’ve superimposed the widths and corresponding decimal value for each group of seven units. The top is my idealized representation of the encoding and the bottom is taken from a photograph of the label:

Now that you know how the bars encode the numbers, let’s turn to what the twelve digits mean.  The first six digits generally denote the product manufacturer. 078742 is Walmart. 038000 is assigned to Kellogg’s.  Apple is 885909 and Starbucks is 099555.  The first digit can define the operation of the code.  For example, when the first digit is a 5, it signifies a coupon and ties the coupon to the purchase required for its use.  If the first digit is a 2, then the item is something sold by weight, like meats, fruit or vegetables, and the last six digits reflect the weight or price per pound.  If the first digit is a 3, the item is a pharmaceutical.

Following the leftmost six-digit manufacturer code is the middle marker (1111, as space-bar-space-bar-space) followed by five digits identifying the product.  Every size, color and combo demands a unique identifier to obtain accurate pricing and an up-to-date inventory.

The last digit in the UPC serves as an error-correcting check digit to ensure the code has been read correctly.  The check digit derives from a calculation performed on the other digits, such that if any digit is altered the check digit won’t match the changed sequence. Forget about altering a UPC with a black marker: the change wouldn’t work out to the same check digit, so the scanner will reject it.

In case you’re wondering, the first product to be scanned at a checkout counter using a bar code was a fifty stick pack of Juicy Fruit gum in Troy, Ohio on June 26, 1974.  It rang up for sixty-seven cents.  Today, 45 sticks will set you back $2.48 (UPC 22000109989).

Steganography: Because Who Doesn’t Love Bacon?

I’m updating my E-Discovery Workbook to begin a new semester at the University of Texas School of Law next week, and I can’t help working in historical tidbits celebrating the antecedents of modern information technology.  The following is new material for my discussion of digital encoding: a topic I regard as essential to a good grasp of digital forensics and electronic evidence.  When you understand encoding, you understand why varying sources of electronically stored information are more alike than different and why forms of production matter.

We record information every day using 26 symbols called “the alphabet,” abetted by helpful signals called “punctuation.”  So, you could say that we write in hexavigesimal (Base26) encoding.

“Binary” or Base2 encoding is notating information using nothing but two symbols: conventionally, the numbers one and zero.  It’s often said that “computer data is stored as ones and zeroes;” but that’s a fiction.  In fact, binary data is stored physically, electronically, magnetically or optically using mechanisms that permit the detection of two clearly distinguishable “states,” whether manifested as faint voltage potentials (e.g., thumb drives), polar magnetic reversals (e.g., spinning hard drives) or pits on a reflective disc deflecting a laser beam (e.g., DVDs).  Ones and zeroes are simply a useful way to notate those states. You could use any two symbols as binary characters, or even two discrete characteristics of the “same” symbol. For now, just ponder how you might record or communicate two “different” characteristics, as by two different shapes, colors, sizes, orientations, markings, etc.

I free you from the trope of ones and zeroes to plumb the evolution of binary communication and explore an obscure coding cul-de-sac called Steganography, from the Greek, meaning “concealed writing.”  But first, we need an aside of Bacon.

I mean, of course, lawyer and statesman Sir Francis Bacon (1561-1626).  Among his many accomplishments, Bacon conceived a bilateral cipher (a “code” in modern parlance) enabling the hiding of messages omnia per omnia, or “anything by anything.”

Bacon’s cipher used the letters “A” and “B” to denote binary values; but if we use ones and zeros instead, we see the straight line from Bacon’s clever cipher to modern ASCII and Unicode encoding.

As with modern computer encoding, we need multiple binary digits (“bits”) to encode or “stand in for” the letters of the alphabet.  Bacon chose the five-bit sets at right:

If we substitute ones and zeroes (right), Bacon’s cipher starts to look uncannily like contemporary binary encodings.

Why five bits and not three or four? The answer lies in binary math (“Oh no! Not MATH!!”). Wait, wait; it won’t hurt. I promise!

If you have one binary digit (21), you have only two unique states (one or zero), so you can only encode two letters, say A and B.  If you have two binary digits (22 or 2×2), you can encode four letters, say A, B, C and D.  With three binary digits (23 or 2x2x2), you can encode eight letters.  Finally, with four binary digits (24 or 2x2x2x2), you can encode just sixteen letters.  So, do you see the problem in trying to encode the letters of a 26-letter alphabet?  You must use at least five binary digits (25 or 32) unless you are content to forgo ten letters.

Sir Francis Bacon wasn’t especially interested in encoding text as bits. His goal was to hide messages in any medium, permitting a clued-in reader to distinguish between differences lurking in plain sight.  Those differences—whatever they might be—serve to denote the “A” or “B” in Bacon’s steganographic technique. For example:

That last one is quite subtle, right?  Here’s how it’s done:

To conceal my name in each of the respective examples, every unbolded/unitalicized/serif character signifies an “A” in Bacon’s cipher and every boldface/italicized/sans serif character signifies a “B” (ignore the spaces and punctuation).  The bold and italic approaches look wonky and could arouse suspicion, but if the fonts are chosen carefully, the absence of serifs should go unnoticed.  Take a closer look to see how it works:

In my examples, I’ve used Bacon’s cipher to hide text within text, but it can as easily hide messages in almost anything.  My favorite example is the class photo of World War I cryptographers trained in Aurora, Illinois by famed cryptographers, William and Elizabeth Friedman.[1]  Before they headed for France, the newly minted codebreakers lined up for the cameraman; but there’s more going on here than meets the eye.

Taking to heart omnia per omnia, the Friedmans ingenuously encoded Sir Francis Bacon’s maxim “knowledge is power” within the photograph using Bacon’s cipher.  The 71 soldiers and their instructors convey the cipher text by facing or looking away from the camera.  Those facing denote an “A.”  Those looking away denote a “B.”  There weren’t quite enough present to encode the entire maxim, so the decoded message actually reads, “KNOWLEDGE IS POWE.”  Here’s the decoding:

A closer look:

Isn’t that mind blowing?!?!

Steganography is something most computer forensic examiners study but rarely use in practice.  Still, it’s a fascinating discipline with a history reaching back to ancient Greece, where masters tattooed secret messages on servants’ shaved scalps and hit “Send” once the hair grew back.  Digital technology brought new and difficult-to-decipher steganographic techniques enabling images, sound and messages to hitch a hidden ride on a wide range of electronic media.

[1] For this material, I’m indebted to “How to Make Anything Signify Anything” by William H. Sherman in Cabinet no. 40 (Winter 2010-2011).

What’s in a Name (or Hash Value)?

A question common to investigation of alleged data theft is, “Are any of our stolen files on our competitor’s systems?”  Forensic examiners track purloined IP using several strategies: among them, searching for matching filenames, hash values, metadata and content.  Any of these can be altered by data thieves seeking to cover their tracks, but most are too confident or too dim to bother.

A current matter underscored the pitfalls of filename and hash searches, prompting me to reflect on a long-ago case where hash searches caused headaches.  The old case stemmed from a settlement of a data theft event requiring a periodic audit of hashes of the defendant’s data to ensure that stolen data hasn’t re-emerged.  The plaintiff sought sanctions because its expert found hash values in the audit that matched hashes tied to stolen PowerPoint presentations.  The defendants were dumbfounded, certain they’d adhered to the settlement and not used any purloined PowerPoints. 

When I stepped in, I confirmed there were matching hash values, but none matched the PowerPoint PPT and PPTX files of interest.  Instead, the hashes matched only benign component image data within the presentations.  The components hashed were standard slide backgrounds (e.g., “woodgrain”) found in any copy of PowerPoint.  Both parties possessed PowerPoints using some of the same generic design elements, but none were the same presentations.  The hashing tool so thoroughly explored the files that embedded images were hashed separately from the files in which they were used and matched other generic elements in other presentations.  No threat at all!

Still other matching files turned out to be articles freely distributed at an industry trade show and zero-byte “null” files that would match any similarly empty files on any machine.  When every hash match was scrutinized, none proved to be stolen data.  Away went the sanctions motion.

The moral of the story is, although it’s extremely unlikely that two different files will share the same hash value, matching hash values don’t always signify the “same” file in practical terms.  Matching files may derive from independent sources, could be benign components of compilations or might match because they hold little or no content.  The math is powerful, but it mustn’t displace common sense.

In the ongoing matter, a simple method used to identify contraband data was filename matching.  The requesting party sought to identify instances of a file called “Book3.xlsx;” and the search turned up hundreds of instances of identically named files in the producing party’s data–though not a single one hash-matched the file of interest.

Why so many false positives?  It turns out Microsoft Excel assigns an incremented name to any new spreadsheet (despite earlier-opened sheets having been closed) so long as even one prior sheet remains open.  So, if you’ve created eight Excel spreadsheets, renamed them and closed all but one, the next new sheet will be named Book9.xlsx by default.  The name “Book3.xlsx” signified only that two prior spreadsheets had been opened.  The takeaway is that, in any large collection, expect to turn up instances of various Book(n).xlsx files created when a user exited and saved a sheet without renaming it from its default name.

Electronic search—by hash, filename, metadata or keyword–is an invaluable tool in investigation and e-discovery; but one best used with a modicum of common sense by those who appreciate its limitations.

C’mon! Bates Numbering Native Production is Easy!

Sometimes, the other side balks at a proposed e-discovery protocol, arguing it’s unduly burdensome to rename native files to their Bates numbers. I find that odd because parties have always named files for Bates numbers whilst doing clunky TIFF productions.  Where did they think the names of all those TIFF images came from?  The truth is, litigants have been naming files to match Bates numbers for as long as we’ve done e-discovery!  It’s easy!

It’s one thing to say something is easy and another to prove its simplicity.  Certainly, if you use an e-discovery vendor, it’s as easy as saying, “Bates number the native files.”  They know what to do. But anyone doing electronic production in-house can add Bates numbers to filenames simply, quickly and cheaply. 

There are various ways to do it.  You can prepend Bates number (Bates##_filename.ext), append Bates number (filename_Bates##.ext) or replace the filename with the Bates number, storing the original name in a load file.  You can even add protective language like “PRODUCED SUBJECT TO PROTECTIVE ORDER.”

Multiple free and low-cost bulk renaming tools are available.  I’ve long praised a powerful, flexible too called Bulk Renaming Utility. It’s free for personal use and $93 for commercial purposes; a powerful tool, but overwhelming to some.  Seeking a simpler tool and one free to use commercially, I found two: File Renamer Basic and Ant Renamer.  Both impressed me with their flexibility and ease of use. 

For Mac users, there’s a nice free tool called File Renamer for MacOS 64 bit, which I’ll also touch on below.

Let’s look at how to configure both Windows tools to Bates number a production.

Suppose the production protocol reads:

Bates Numbers. All Bates numbers will consist of a three-digit Alpha Prefix, followed immediately by an 8-digit numeric: AAA########. There must be no spaces in the Bates number. Any numbers with less than 8 digits will be front padded with zeros to reach the required 8 digits. ESI will be Bates numbered by substituting, prepending or appending the Bates number for/to the file name.

Assuming there have been ten other items produced earlier,, we must begin Bates numbering at DEF00000011.  For this tutorial, I’ll use just six photos of American coins, but it could as easily be thousands of files of any sort.  Here are thumbnails of the exemplar photos:

The table below lists the filenames and MD5 hash values of the files, allowing us to confirm that a renaming tool won’t otherwise alter the evidence.

Original NameTypeSizeMD5 Hash

To demonstrate, I placed working copies of all the files needing Bates numbers in a Desktop folder named Production photos 11-21-20.  Inside this folder, I made an empty subfolder called BATES NUMBERED PHOTOS.  You don’t have follow suit, but however you approach it, don’t work on the source evidence; instead, create and produce renamed working copies.

File Renamer Basic

After installing and kicking off the program, I set the following parameters:

  1. Configure the “Folder” and “Copy to” paths.
  2. Set the three-digit Alpha Prefix required by the Protocol (I used “DEF” for Defendants).
  3. Set Unique Parameter to “Numbers,” “Increment” by 1, mask with eight zeroes and “Start at 11” (the next unassigned Bates number).
  4. Set Separator to a single underscore.  [While the protocol neither requires nor prohibits adding a separator between the Bates number and filename, I like to add it for clarity]
  5. In the Filename settings box, check “Place Unique Parameter before Filename.”
  6. Click “Preview,” and if you’re happy with the preview, click “Apply.”

Running hash values against the renamed files, we see that renaming the files has not altered their hash values.

NameTypeSizeMD5 Hash

Ant Renamer

After installing and kicking off the program, I set the following parameters:

  1. Using “Add Folders,” navigate to and select the folder with the files to be renamed.
  2. Click F10 to launch the Options menu and, under the >Processing tab, check the box “Copy instead of Rename,” then click “OK.”
  3. Under “Actions,” select “Enumeration” and configure the mask as: DEF%num%_%name%%ext%
  4. Set “Start at:” to 11 and “Number of Digits” to 8.
  5. Click “Preview of Selected Files” and, if all seems well, click GO on the menu.

Note that these settings will create a Bates numbered set of duplicate files in the same folder as the source files, NOT in the subfolder.

Frankly, it’s harder to describe the task than to complete it. After a few minutes playing with the settings, you’ll easily figure out how to prepend a Bates number, append it or swap it for the original name. Once you’ve gotten the settings where you’d like them, File Renamer Basic allows you to save your custom settings as a profile and apply it to future productions.

I spent only a short time investigating The Mac application FileRenamer, but it was intuitive enough to use without any unmanly reading of directions and took just seconds to configure numbering and set a mask to finish the task. I configured numbering in Settings>Numbering (Initial value: 11, Increment: 1 and Fixed Length with Leading Zeroes: 8) then the mask to include the three-digit alpha prefix, padded numbering and underscore separator to precede the filename (DEF%num%_%name%).

Easy as pie! And while we’re on the subject of pie, HAPPY THANKSGIVING!