This is the ninth in a series revisiting Ball in Your Court columns and posts from the primordial past of e-discovery–updating and critiquing in places, and hopefully restarting a few conversations. As always, your comments are gratefully solicited.
The Path to Production: Are We There Yet?
(Part IV of IV)
[Originally published in Law Technology News, January 2006]
The e-mail’s assembled and accessible. You could begin review immediately, but unless your client has money to burn, there’s more to do before diving in: de-duplication. When Marge e-mails Homer, Bart and Lisa, Homer’s “Reply to All” goes in both Homer’s Sent Items and Inbox folders, and in Marge’s, Bart’s and Lisa’s Inboxes. Reviewing Homer’s response five times is wasteful and sets the stage for conflicting relevance and privilege decisions.
Duplication problems compound when e-mail is restored from backup tape. Each tape is a snapshot of e-mail at a moment in time. Because few users purge mailboxes month-to-month, one month’s snapshot holds nearly the same e-mail as the next. Restore a year of e-mail from monthly backups, and identical messages multiply like rabbits.
De-duplication uses metadata, cryptographic hashing or both to exclude identical messages. De-duplication may be implemented vertically, within a single mailbox or custodian, and horizontally, across multiple mailboxes and custodians. When questioning or prepping a witness, you’ll want to see all relevant messages in the witness’ mailbox, not just unique messages; so track and log deduplication to facilitate re-population of duplicated items. Deduplication works best when unique messages and de-duplication logs merge in a database, allowing a reviewer to reconstruct mailboxes.
Be wary of “horizontal” deduplication when discovery strategies change. An e-mail sent to dozens of recipients de-duplicated from all but one custodian’s mailbox may be lost forever if the one custodian’s e-mail ends up not being produced.
Rather than plow through zillions of e-mails for responsive and privileged items, reviewers often turn to keyword or concept search tools. Automated search tools make short work of objective requests for “all e-mail between Simpson and Burns,” but may choke on “all e-mail concerning plant safety.” To frame effective keyword searches, you have to know the lingo describing events and objects central to the case. Even then, crucial communiqués like, “My lips are sealed” or “Excellent” may be missed.
Are tireless black box tools an adequate substitute for human review? The jury’s still out. In a seminal study, keyword searching fared poorly, finding only about one-fifth of relevant items identified by human reviewers. However, litigation management consultant Anne Kershaw looked at an advanced search tool and found machines performed almost twice as well as humans. The safest course is to arm conscientious, well-trained reviewers with state-of-the-art search tools and work cooperatively with opposing counsel to frame searches. Even then, examine the mailboxes of key witnesses, message-by-message. [Please don’t fail to check out my 2015 take on this below].
Paper redaction was easy: We concealed privileged text using a black marker and photocopied. It’s trickier to eradicate privileged and confidential information at the data layer of document image files and within encoded attachments and metadata. Run your approach by an expert.
For production, should you re-populate to restore relevant, non-privileged items previously deduplicated, or will the other side accept a de-duplication log? Never produce deduplicated e-mail without memorializing that opposing counsel knows of the deduplication and waives re-population.
There isn’t just one “right” media or format for deliverables. Options for production media include network transmittal, external hard drives, optical disks, tape, online repositories and hard copies. Formats range from native (.pst), exported (.eml), text (.txt), load files (Concordance, Summation), image files with or without data layers (.pdf, .tiff) and delimited files. Evidence ill-suited to .tiff production (databases, some spreadsheets, etc.), compels native production. I’ve come to understand that all but scanned paper documents are ill-suited to .tiff production, and .tiff adds expense in a big way.
Inevitably, something will be overlooked or lost, but sanctions need not follow every failure. Document diligence throughout the discovery effort and be prepared to demonstrate why bad decisions were sound at the time and under the circumstances. Note where the client looked for responsive information, what was found, how much time and money was expended, what was sidelined and why. Avoid sanctions by proving good faith.
Are We There Yet?
The path to production is a long and winding road, but it’s heading in the right direction. Knowing how to manage electronic evidence is as vital to trial practice as the ability to draft pleadings or question witnesses. Don’t forget what happened on Main Street when they built the Interstate. Paper discovery’s the old road. E-discovery’s the Interstate.
My skeptical comments about technology assisted review were typical of the time. Many still regard the use of well-trained reviewers armed with good review tools as the safest course; but, I’m no longer among their number. What evolution we’ve seen in e-discovery in the last ten years has manifested most notably in the emergence of predictive coding and advanced analytics. TAR is here to stay, and it won’t be pricey and contentious forever. Also in the category of “what took us so long?” is be the growing acceptance of production in native electronic forms. How is it that lawyers have stayed so smitten with paying more to get less? Tiff productions aren’t just degraded in terms of functionality and completeness, they are so much “fatter” than their native counterparts, it’s common to pay ten times as much to ingest and host .tiff productions over native.