I’ve just completed the E-Discovery Workbook for the 2019 Georgetown E-Discovery Training Academy. The Workbook readings and exercises plot the path that evidence follows from the documents lawyers use in court back to the featureless stream of binary electrical impulses common to all information stored electronically. At nearly 500 pages, the technology of e-discovery is its centerpiece, and I’ve lately added a 21-point synopsis of the storage concepts, technical takeaways and vocabulary covered. Here is that in-a-nutshell synopsis:
- Common law imposes a duty to preserve potentially-relevant information in anticipation of litigation
- Most information is electronically-stored information (ESI)
- Understanding ESI entails knowledge of information storage media, encodings and formats
- There are many types of e-storage media of differing capacities, form factors and formats:
a) analog (phonograph record) or digital (hard drive, thumb drive, optical media)
b) mechanical (electromagnetic hard drive, tape, etc.) or solid-state (thumb drive, SIM card, etc.)
- Computers don’t store “text,” “documents,” “pictures,” “sounds.” They only store bits (ones or zeroes)
- Digital information is encoded as numbers by applying various encoding schemes:
a) ASCII or Unicode for alphanumeric characters;
b) JPG for photos, DOCX for Word files, MP3 for sound files, etc.
- We express these numbers in a base or radix (base 2 binary, 10 decimal, 16 hexadecimal, 60 sexagesimal). E-mail messages encode attachments in base 64.
- The bigger the base, the smaller the space required to notate and convey the information
- Digitally encoded information is stored (written):
a) physically as bytes (8-bit blocks) in sectors and partitions
b) logically as clusters, files, folders and volumes
- Files use binary header signatures to identify file formats (type and structure) of data
- Operating systems use file systems to group information as files and manage filenames and metadata
- File systems employ filename extensions (e.g., .txt, .jpg, .exe) to flag formats
- All ESI includes a component of metadata (data about data) even if no more than needed to locate it
- A file’s metadata may be greater in volume or utility than the contents of the file it describes
- File tables hold system metadata about the file (e.g., name, locations on disk, MAC dates): it’s CONTEXT
- Files hold application metadata (e.g., EXIF geolocation data in photos, comments in docs): it’s CONTENT
- File systems allocate clusters for file storage; deleting files releases cluster allocations for reuse
- If unallocated clusters aren’t reused, deleted files may be recovered (“carved”) via computer forensics
- Forensic (“bitstream”) imaging is a method to preserve both allocated and unallocated clusters
- Because data are numbers, data can be digitally “fingerprinted” using one-way hash algorithms (MD5, SHA1)
- Hashing facilitates identification, deduplication and de-NISTing of ESI in e-discovery
All of these topics and more are covered in depth at the Academy, punctuated by substantive and substantial hands-on exercises. We ask more of the students than most seasoned e-discovery professionals can deliver. It’s hours of effort before you arrive and a full week of day and night endeavor once you’re here. Over a thousand pages of written material covered in toto. Really, no picnic. A true boot camp. It exhausts and overwhelms those anticipating conventional professional education; but those who do the work emerge transformed. They leave competent, confident and equipped with new eyes for ESI. Think you can hack it? We can help. Hope to see you there June 2-7.
P.S. No member of the Academy faculty is compensated. We are all volunteers, there because we believe the more you know about e-discovery, the more you can contribute to the just, speedy and inexpensive administration of justice.