It’s nearly finals time for the students in my E-Discovery and Digital Evidence course at the University of Texas School of Law. I just completed the Final Exam Study Guide for the class and thought readers who wonder what a tech-centric law school e-discovery curriculum looks like might enjoy seeing what’s asked of the students in a demanding 3-credit law school course. Whether you’re ACEDS certified, head of your e-discovery practice group or just an e-discovery groupie like me, consider how you’d fare preparing for an exam with this scope and depth. I’m proud of my bright students. You’d be really lucky to hire one of my stars.
E-Discovery – Spring 2021 Final Exam Study Guide
The final exam will cover all readings, lectures, exercises and discussions on the syllabus.
(Syllabus ver. 21.0224 in conjunction with Workbook ver. 21.0214 and Announcements).
- We spent a month on meeting the preservation duty and proportionality. You undertook a two-part legal hold drafting exercise. Be prepared to bring skills acquired from that effort to bear on a hypothetical scenario. Be prepared to demonstrate your understanding of the requisites of fashioning a defensible legal hold and sensibly targeting a preservation demand to an opponent. As well, your data mapping skills should prove helpful in addressing the varied sources of potentially relevant ESI that exist, starting at the enterprise level with The Big Six (e-mail, network shares, mobile devices, local storage, social networking and databases). Of course, we must also consider Cloud repositories and scanned paper documents as potential sources.
- An essential capability of an e-discovery lawyer is to assess a case for potentially relevant ESI, fashion and implement a plan to identify accessible and inaccessible sources, determine their fragility and persistence, scope and deploy a litigation hold and take other appropriate first steps to counsel clients and be prepared to propound and respond to e-discovery, especially those steps needed to make effective use of the FRCP Rule 26(f) meet-and-confer process. Often, you must act without having all the facts you’d like and rely upon your general understanding of ESI and information systems to put forward a plan to acquire the facts and do so with sensitivity to the cost and disruption your actions may engender. Everything we’ve studied was geared to instilling those capabilities in you.
- CASES: You are responsible for all cases covered during the semester. When you read each case, you should ask yourself, “What proposition might I cite this case to support in the context of e-discovery?” That’s likely to be the way I will have you distinguish the cases and use them in the exam. I refer to cases by their style (plaintiff versus defendant), so you should be prepared to employ a mnemonic to remember their most salient principles of each, e.g., Columbia Pictures is the ephemeral data/RAM case; Rambus is the Shred Day case; In re NTL is the right of control case; In re: Weekley Homes is the Texas case about accessing the other side’s hard drives, Wms v. Sprint is the spreadsheet metadata case (you get the idea). I won’t test your memory of jurists, but it’s helpful-not-crucial to recall the authors of the decisions (especially when they spoke to our class like Judges Peck and Grimm).
Case Review Hints:
- Green v. Blitz: (Judge Ward, Texas) This case speaks to the need for competence in those responsible for preservation and collection and what constitutes a defensible eDiscovery strategy. What went wrong here? What should have been done differently?
- In re: Weekly Homes: (Texas Supreme Court) This is one of the three most important Texas cases on ESI. You should understand the elements of proof which the Court imposes for access to an opponent’s storage devices and know terms of TRCP Rule 196.4, especially the key areas where the state and Federal ESI rules diverge.
- Zubulake: (Judge Scheindlin, New York) The Zubulake series of decisions are seminal to the study of e-discovery in the U.S. Zubulake remains the most cited of all EDD cases, so is still a potent weapon even after the Rules amendments codified much of its lessons. Know what the case is about, how the plaintiff persuaded the court that documents were missing and what the defendant did or didn’t do in failing to meet its discovery obligations. Know what an adverse inference instruction is and how it was applied in Zubulake versus what must be established under FRCP Rule 37€ after 2015. Know what Judge Scheindlin found to be a litigant’s and counsel’s duties with respect to preservation. Seven-point analytical frameworks (as for cost-shifting) make good test fodder.
- Williams v. Sprint: (Judge Waxse, Kansas). Williams is a seminal decision respecting metadata. In Williams v. Sprint, the matter concerned purging of metadata and the locking of cells in spreadsheets in the context of an age discrimination action after a reduction-in-force. Judge Waxse applied Sedona Principle 12 in its earliest (and now twice revised) form. What should Sprint have done? Did the Court sanction any party? Why or why not?
- Rodman v. Safeway: (Judge Tigar, ND California) This case, like Zubulake IV, looks at the duties and responsibilities of counsel when monitoring a client’s search for and production of potentially responsive ESI? What is Rule 26(g), and what does it require? What constitutes a reasonable search? To what extent and under what circumstances may counsel rely upon a client’s actions and representations in preserving or collecting responsive ESI?
- Columbia Pictures v. Bunnell: (Judge Chooljian, California) What prompted the Court to require the preservation of such fleeting, ephemeral information? Why were the defendants deemed to have control of the ephemeral data? Unique to its facts?
- In re NTL, Inc. Securities Litigation: (Judge Peck, New York) Be prepared to discuss what constitutes control for purposes of imposing a duty to preserve and produce ESI in discovery and how it played out in this case. I want you to appreciate that, while a party may not be obliged to succeed in compelling the preservation or production of relevant information beyond its care, custody or control, a party is obliged to exercise all such control as the party actually possesses, whether as a matter of right or by course of dealing. What’s does The Sedona Conference think about that?
- William A. Gross Constr. Assocs., Inc. v. Am. Mfrs. Mut. Ins. Co.: (Judge Peck, New York) What was the “wake up call,” who were expected to awaken and on what topics?
- Adams v. Dell: (Judge Nuffer, Utah) What data was claimed to have been lost? What was supposed to have triggered the duty to preserve? What did the Court say about a responding party’s duty, particularly in designing its information systems? Outlier?
- RAMBUS: (Judge Whyte, California) I expect you to know what happened and to appreciate that the mere reasonable anticipation of litigation–especially by the party who brings the action–triggers the common law duty to preserve. Be prepared to address the sorts of situations that might or might not trigger a duty to initiate a legal hold.
- United States v. O’Keefe (Judge Facciola, DC): I like this case for its artful language (Where do angels fear to tread?) and consideration of the limits and challenges of keyword search. The last being a topic that bears scrutiny wherever it has been addressed in the material. That is, does keyword search work as well as lawyers think, and how can we improve upon it and compensate for its shortcomings?
- Victor Stanley v. Creative Pipe I & II (Judge Grimm, Maryland): Read VS I with an eye toward understanding the circumstances when inadvertent production triggers waiver (pre-FRE 502). What are the three standards applied to claims of waiver? What needs to be in the record to secure relief?
Don’t get caught up in the prolonged factual minutiae of VS II. Read VS II to appreciate the varying standards that once existed across the Circuits for imposition of spoliation sanctions and that pre-date the latest FRCP Rules amendments, i.e., Rule 37(e).
- Anderson Living Trust v. WPX Energy Production, LLC (Judge Browning, New Mexico): This case looks at the application and intricacies of FRCP Rule 34 when it comes to ESI versus documents. My views about the case were set out in the article you read called “Breaking Badly.”
- In re: State Farm Lloyds (Texas Supreme Court): Proportionality is the buzzword here; but does the Court elevate proportionality to the point of being a costly hurdle serving to complicate a simple issue? What does this case portend for Texas litigants in terms of new hoops to jump over issues as straightforward as forms of production? What role did the Court’s confusion about forms (and a scanty record) play in the outcome?
- Monique Da Silva Moore, et al. v. Publicis Groupe & MSL Group and Rio Tinto Plc v. Vale S.A., (Judge Peck, New York): DaSilva Moore is the first federal decision to approve the use of the form of Technology Assisted Review (TAR) called Predictive Coding as an alternative to linear, manual review of potentially responsive ESI. Rio Tinto is Judge Peck’s follow up, re-affirming the viability of the technology without establishing an “approved” methodology.
- Brookshire Bros. v. Aldridge (Texas Supreme Court): This case sets out the Texas law respecting spoliation of ESI…or does it? Is the outcome and “analysis” here consistent with the other preservation and sanctions cases we’ve covered?
- VanZant v. Pyle (Judge Sweet, New York): Issues of control and spoliation drive this decision. Does the Court correctly apply Rule 37(e)?
- CAT3 v. Black Lineage (Judge Francis, New York):This trademark infringement dispute concerned an apparently altered email. Judge Francis found the alteration sufficient to support sanctions under Rule 37(e). How did he get there? Judge Francis also addressed the continuing viability of discretionary sanctions despite 37(e). What did he say about that?
- EPAC v. Thos. Nelson, Inc.: Read this report closely to appreciate how the amended Rules, case law and good practice serve to guide the court in fashioning remedial measures and punitive sanctions. Consider the matter from the standpoint of the preservation obligation (triggers and measures) and from the standpoint of proportionate remedial measures and sanctions. What did the Special Master do wrong here?
- Mancia v. Mayflower (Judge Grimm, Maryland): Don’t overlook this little gem in terms of its emphasis on counsel’s duties under FRCP Rule 26(g). What are those duties? What do they signify for e-discovery? What is the role of cooperation in an adversarial system?
- Race Tires America, Inc. v. Hoosier Racing Tire Corp. (Judge Vanaskie, Pennsylvania): This opinion cogently defines the language and limits of 28 U.S.C. §1920 as it relates to the assessment of e-discovery expenses as “taxable costs.” What common e-discovery expenses might you seek to characterize as costs recoverable under §1920, and how would you make your case?
- Zoch v. Daimler (Judge Mazzant, Texas): Did the Court correctly resolve the cross-border and blocking statute issues? Would the Court’s analysis withstand appellate scrutiny once post-GDPR?
Remember: bits, bytes, sectors, clusters (allocated and unallocated), tracks, slack space, file systems and file tables, why deleted doesn’t mean gone, forensic imaging, forensic recovery techniques like file carving, EXIF data, geolocation, file headers/binary signatures, hashing, normalization, de-NISTing, deduplication and file shares. For example: you should know that an old 3.5” floppy disk typically held no more than 1.44MB of data, whereas the capacity of a new hard drive or modern backup tape would be measured in terabytes. You should also know the relative capacities indicated by kilobytes, megabytes, gigabytes, terabytes and petabytes of data (i.e., their order of ascendancy, and the fact that each is 1,000 times more or less than the next or previous tier). Naturally, I don’t expect you to know the tape chronology/capacities, ASCII/hex equivalencies or other ridiculous-to-remember stuff.
4. TERMINOLOGY: Lawyers, more than most, should appreciate the power of precise language. When dealing with professionals in technical disciplines, it’s important to call things by their right name and recognize that terms of art in one context don’t necessarily mean the same thing in another. When terms have been defined in the readings or lectures, I expect you to know what those terms mean. For example, you should know what ESI, EDRM, RAID, system and application metadata (definitely get your arms firmly around application vs. system metadata), retention, purge and rotation mean (e.g., grandfather-father-son rotation); as well as Exchange, O365, 26(f), 502(d), normalization, recursion, native, near-native, TIFF+, load file, horizontal, global and vertical deduplication, IP addressing, data biopsy, forensically sound, productivity files, binary signatures and file carving, double deletion, load files, delimiters, slack space, unallocated clusters, UTC offset, proportionality, taxable costs, sampling, testing, iteration, TAR, predictive coding, recall, precision, UTC, VTL, SQL, etc.
5. ELECTRONIC DISCOVERY REFERENCE MODEL: We’ve returned to the EDRM many times as we’ve moved from left to right across the iconic schematic. Know it’s stages, their order and what those stages and triangles signify.
6. ENCODING: You should have a firm grasp of the concept of encoded information, appreciating that all digital data is stored as numbers notated as an unbroken sequence of 1s and 0s. How is that miracle possible? You should be comfortable with the concepts described in pp. 132-148 of the Workbook (and our class discussions of the fact that the various bases are just ways to express numbers of identical values in different notations). You should be old friends with the nature and purpose of, e.g., base 2 (binary), base 10 (decimal) base 16 (hexadecimal), base 64 (attachment encoding), ASCII and UNICODE.
7 STORAGE: You should have a working knowledge of the principal types and capacities of common electromagnetic and solid-state storage devices and media (because data volume has a direct relationship to cost of processing and time to review in e-discovery). You should be able to recognize and differentiate between, e.g., floppy disks, thumb drives, optical media, hard drives, solid state storage devices, RAID arrays and backup tape, including a general awareness of how much data they hold. Much of this is in pp. 22-48 of the Workbook (Introduction to Data Storage Media). For ready reference and review, I’ve added an appendix to this study guide called, “Twenty-One Key Concepts for Electronically Stored Information.”
8. E-MAIL: E-mail remains the epicenter of corporate e-discovery; so, understanding e-mail systems, forms and the underlying structure of a message is important. The e-mail chapter should be reviewed carefully. I wouldn’t expect you to know file paths to messages or e-mail forensics, but the anatomy of an e-mail is something we’ve covered in detail through readings and exercises. Likewise, the messaging protocols (POP, MAPI, IMAP, WEB, MIME, etc.), mail single message and container formats (PST, OST, EDB, NSF, EML, MSG, DBX, MHTML, MBOX) and leading enterprise mail client-server pairings (Exchange/Outlook, Domino/Notes, O365/browser) are worth remembering. Don’t worry, you won’t be expected to extract epoch times from boundaries again. 😉
9. FORMS: Forms of production loom large in our curriculum. Being that everything boils down to just an unbroken string of ones-and-zeroes, the native forms and the forms in which we elect to request and produce them (native, near-native, images (TIFF+ and PDF), paper) play a crucial role in all the “itys” of e-discovery: affordability, utility, intelligibility, searchability and authenticability. What are the purposes and common structures of load files? What are the pros and cons of the various forms of production? Does one size fit all? How does the selection of forms play out procedurally in federal and Texas state practice? How do we deal with Bates numbering and redaction? Is native and near-native production better and, if so, how do we argue the merits of native production to someone wedded to TIFF images? This is HUGE in my book! There WILL be at least one essay question on this and likely several other test questions.
10. SEARCH AND REVIEW: We spent a fair amount of time talking about and doing exercises on search and review. You should understand the various established and emerging approaches to search: e.g., keyword search, Boolean search, fuzzy search, stemming, clustering, predictive coding and Technology Assisted Review (TAR). Why is an iterative approach to search useful, and what difference does it make? What are the roles of testing, sampling and cooperation in fashioning search protocols? How do we measure the efficacy of search? Hint: You should know how to calculate recall and precision and know the ‘splendid steps’ to take to improve the effectiveness and efficiency of keyword search (i.e., better F1 scores).
You should know what a review tool does and customary features of a review platform. You should know the high points of the Blair and Maron study (you read and heard about it multiple times, so you need not read the study itself). Please also take care to understand the limitations on search highlighted in your readings and those termed The Streetlight Effect.
11.ACCESSIBILITY AND GOOD CAUSE: Understand the two-tiered analysis required by FRCP Rule 26(b)(2)(B). When does the burden of proof shift, and what shifts it? What tools (a/k/a conditions) are available to the Court to protect competing interests of the parties
12. FRE RULE 502: It’s your friend! Learn it, love it, live it (or at least know when and how to use it). What protection does it afford against subject matter waiver? Is there anything like it in state practice? Does it apply to all legally cognized privileges?
13. 2006 AND 2015 RULES AMENDMENTS: You should understand what they changed with respect to e-discovery. Concentrate on proportionality and scope of discovery under Rule 26, along with standards for sanctions under new Rule 37(e). What are the Rule 26 proportionality factors? What are the findings required to obtain remedial action versus serious sanctions for spoliation of ESI under 37(e)? Remember “intent to deprive.”
14. MULTIPLE CHOICE: When I craft multiple choice questions, there will typically be two answers you can quickly discard, then two you can’t distinguish without knowing the material. So, if you don’t know an answer, you increase your odds of doing well by eliminating the clunkers and guessing. I don’t deduct for wrong answers. Read carefully to not whether the question seeks the exception or the rule. READ ALL ANSWERS before selecting the best one(s) as I often include an “all of the above” or “none of the above” option.
15. All lectures and reviews of exercises are recorded and online for your review, if desired.
16. In past exams, I used the following essay questions. These will not be essay questions on your final exam; however, I furnish them here as examples of the scope and nature of prior essay questions:
EXAMPLE QUESTION A: On behalf of a class of homeowners, you sue a large bank for alleged misconduct in connection with mortgage lending and foreclosures. You and the bank’s counsel agree upon a set of twenty Boolean and proximity queries including:
- fnma AND deed-in-lieu
- 1/1/2009 W/4 foreclos!
- Resumé AND loan officer
- LTV AND NOT ARM
- (Problem W/2 years) AND HARP
These are to be run against an index of ten loan officers’ e-mail (with attached spreadsheets, scanned loan applications, faxed appraisals and common productivity files) comprising approximately 540,000 messages and attachments). Considering the index search problems discussed in class and in your reading called “The Streetlight Effect in E-Discovery,” identify at least three capabilities or limitations of the index and search tool that should be determined to gauge the likely effectiveness of the contemplated searches. Be sure to explain why each matter.
I am not asking you to assess or amend the agreed-upon queries. I am asking what needs to be known about the index and search tool to ascertain if the queries will work as expected.
EXAMPLE QUESTION B: The article, A Bill of Rights for E-Discovery included the following passage:
I am a requesting party in discovery.
I have duties.
I am obliged to: …
Work cooperatively with the producing party to identify reasonable and effective means to reduce the cost and burden of discovery, including, as appropriate, the use of tiering, sampling, testing and iterative techniques, along with alternatives to manual review and keyword search.
Describe how “tiering, sampling, testing and iterative techniques, along with alternatives to manual review and keyword search” serve to reduce the cost and burden of e-discovery. Be sure to make clear what each term means.
It’s been an excellent semester and a pleasure for me to have had the chance to work with a bright bunch. Thank you for your effort! I’ve greatly enjoyed getting to know you notwithstanding the limits imposed by the pandemic and Mother Nature’s icy wrath. I wish you the absolute best on the exam and in your splendid careers to come. Count me as a future resource to call on if I can be of help to you. Best of Luck! Craig Ball
Twenty-One Key Concepts for Electronically Stored Information
- Common law imposes a duty to preserve potentially relevant information in anticipation of litigation.
- Most information is electronically stored information (ESI).
- Understanding ESI entails knowledge of information storage media, encodings and formats.
- There are many types of e-storage media of differing capacities, form factors and formats:
a) analog (phonograph record) or digital (hard drive, thumb drive, optical media).
b) mechanical (electromagnetic hard drive, tape, etc.) or solid-state (thumb drive, SIM card, etc.).
- Computers don’t store “text,” “documents,” “pictures,” “sounds.” They only store bits (ones or zeroes).
- Digital information is encoded as numbers by applying various encoding schemes:
a) ASCII or Unicode for alphanumeric characters.
b) JPG for photos, DOCX for Word files, MP3 for sound files, etc.
- We express these numbers in a base or radix (base 2 binary, 10 decimal, 16 hexadecimal, 60 sexagesimal). E-mail messages encode attachments in base 64.
- The bigger the base, the smaller the space required to notate and convey the information.
- Digitally encoded information is stored (written):
a) physically as bytes (8-bit blocks) in sectors and partitions.
b) logically as clusters, files, folders and volumes.
- Files use binary header signatures to identify file formats (type and structure) of data.
- Operating systems use file systems to group information as files and manage filenames and metadata.
- Windows file systems employ filename extensions (e.g., .txt, .jpg, .exe) to flag formats.
- All ESI includes a component of metadata (data about data) even if no more than needed to locate it.
- A file’s metadata may be greater in volume or utility than the contents of the file it describes.
- File tables hold system metadata about the file (e.g., name, locations on disk, MAC dates): it’s CONTEXT.
- Files hold application metadata (e.g., EXIF geolocation data in photos, comments in docs): it’s CONTENT.
- File systems allocate clusters for file storage, deleting files releases cluster allocations for reuse.
- If unallocated clusters aren’t reused, deleted files may be recovered (“carved”) via computer forensics.
- Forensic (“bitstream”) imaging is a method to preserve both allocated and unallocated clusters.
- Data are numbers, so data can be digitally “fingerprinted” using one-way hash algorithms (MD5, SHA1).
- Hashing facilitates identification, deduplication and de-NISTing of ESI in e-discovery.