The Great Pandemic Leap

Much has been made of the “Great Pandemic Leap” by law firms and courts. Pandemic proved to be, if not the mother of invention, at least the mother****** who FINALLY got techno tardy lawyers to shuffle forward. The alleged leap had nothing to do with new technology. Zoom and other collaboration tools have been around a long time. In fact, April 21, 2021 was Zoom’s 10th Birthday! Happy Birthday, Zoom! Thanks for being there for us.

No, it wasn’t new technology. The ‘Ten Years in Ten Weeks’ great leap was enabled by compulsion, adoption and support.

“Compulsion” because we couldn’t meet face-to-face, and seeing faces (and slides and white boards) is important.
“Adoption” because so many embraced Zoom and its ilk that we suddenly enjoyed a common meeting place.
“Support” because getting firms and families up and running on Zoom et al. became a transcendent priority.

It didn’t hurt that schools moving to Zoom served to put a support scion in many lawyers’ homes and, let’s face it Atticus, the learning curve wasn’t all that steep. Everyone already had a device with camera and microphone. Zoom made it one-click easy to join a meeting, even if eye-level camera positioning and unmuting of microphones has proven more confounding to lawyers than the Rule Against Perpetuities.

For me, the Great Leap manifested as the near-universal ability to convene on a platform where screen sharing and remote control were simple. I’ve long depended on remote control and screen sharing tools to access machines by Remote Desktop Protocol (RDP) or TeamViewer (not to mention PCAnywhere and legacy applications that made WFH possible in the 90s and aughts). But, that was on my own machines. Linking to somebody else’s machine without a tech-savvy soul on the opposite end was a nightmare. If you’ve ever tried to remotely support a parent, you understand. “No, Mom, please don’t click anything until I tell you. Oh, you already did? What did the error message say? Next time, don’t hit ‘Okay” until you read the message, please Mom.

E-discovery and digital forensics require defensible data identification, preservation and collection. The pandemic made deskside reviews and onsite collection virtually impossible, or more accurately, those tasks became possible only virtually. Suddenly, miraculously, everyone knew how to join a Zoom call, so custodians could share screens and hand over remote control of keyboard and mouse. I could record the sessions to document the work and remotely load software (like iMazing or CoolMuster) to preserve and access mobile devices. Remote control and screen sharing let me target collection efforts based on my judgment and not be left at the mercy of a custodian’s self-interested actions. Custodians could observe, assist and intervene in my work or they could opt to walk away and leave me to do my thing. I was “there,” but less intrusively and spared the expense and hassle of travel. I could meet FRCP 26(g) obligations and make a record to return to if an unforeseen issue arose.

In my role as investigator, there’s are advantages attendant to being onsite; e.g., I sometimes spot evidence of undisclosed data sources. But, weighed against the convenience and economy of remote identification and collection, I can confidently say I’m never going back to the old normal when I can do the work as well via Zoom.

Working remotely as I’ve described requires a passing familiarity with Zoom screen sharing, if only to be able to talk others through unseen menus. As Zoom host, you will need to extend screen sharing privileges to the remote user. Do this on-the-fly by making the remote user a meeting co-host, (click “More” alongside their name in the Participants screen). Alternatively, you can select Advanced Sharing Options from the Share Screen menu. Under “Who can Share?” choose “All Participants.”

To acquire control of the remote user’s mouse and keyboard, have the remote user initiate a screen share then open the View Options dropdown menu alongside the green bar indicating you’re viewing a shared screen. Select “Request Remote Control,” then click “Request” to confirm. The remote user will see a message box seeking authorization to control their screen. Once authorized, click inside the shared screen window to take control of the remote machine.

If you need to inspect a remote user’s iPhone or iPad, Zoom supports sharing those devices using a free plugin that links the mobile device over the same WiFi connection as the Zoom session. To initiate an iPhone/iPad screen share, instruct the remote user to click Screen Share and then select the iPhone/iPad icon at right for further instructions. Simpler still, have the remote user install Zoom on the phone or pad under scrutiny and join the Zoom session from the mobile device. Once in the meeting, the remote user screen shares from the session on the mobile device. Easy-peasy AND it works for Android phones, too!

So Counselor, go ahead and take that victory lap. Whether you made a great leap or were dragged kicking and screaming to a soupçon of technical proficiency, it’s great to see you! Hang onto those gains, and seek new ways to leverage technology in your practice. Your life may no longer depend on it, but your future certainly does.

Life Lessons from E-Discovery

Eight years ago, my old friends and Über-thought leaders Bill Hamilton and George Socha created an e-discovery conference targeting an underserved constituency: lawyers without the luxury of an e-discovery practice group or litigation support staff. Regular folks. The always enlightening and enjoyable University of Florida E-Discovery Conference has been a fixture on my speaking calendar for years. This year, the pandemic foreclosed the customary face-to-face confab in central Florida, so we convened virtually– just Bill, George, me and 3,000 of our closest friends. Seriously, the turnout was astounding: 3,058 unique attendees! BRAVO!

My contribution was modest–fifteen minutes chatting about Life Lessons from E-Discovery. Here’s what I shared:

Thirty years ago, Robert Fulgham published a bestseller called, “All I Really Need To Know I Learned In Kindergarten.”  It posited that the simple lessons we gained as children can guide us all our lives.

The lessons were things like:

  • Share everything.
  • Play fair.
  • Don’t hit people.
  • Put things back where you found them.
  • Clean up your own mess.
  • Don’t take things that aren’t yours.
  • Say you’re sorry when you hurt somebody.
  • Flush and wash your hands.

That last one proved especially useful of late!

Fulgham’s point was that childish precepts extrapolate well to our adult lives, to relationships, business, government, really to everything.

I’ve been a student and teacher of electronic evidence for forty years, so when Professor Hamilton asked me to say a few words today, I wondered what I’d gleaned from electronic discovery that might yield life lessons like those kindergarten rules.  Many things came to mind.  Things like:

They all say basically the same thing: treat others with respect and courtesy.  I commend them all to you, but the shameful truth is I’ve violated enough of those precepts that I feel unworthy to preach their indisputable value.

Instead, I sought five precepts uniquely suited to e-discovery, five lessons I’ve acquired and come to believe in through hard experience.

I should confess that my point of view is a jaundiced and cynical one.  As a special master, Courts bring me in when discovery’s gone off the rails, often when sanctions are in the offing.  In my world, incompetence and deceit are the norm.  So, if my lessons strike you as too obvious or too simple, I’m thrilled to hear it.

The first rule, and really the most fundamental is:

Tell the truth based on fact. 

Albert Einstein said, “Imagination is more important than information.”  Sorry, Al, not in e-discovery.

When it comes to e-discovery, information is more important than imagination.  In e-discovery, information is everythingMeasurement trumps opinion.  Your gut sense that the other side is withholding evidence is fascinating, but it’s not proof.  Your certainty that the client has no responsive data is just baloney without a competent search. 

If we are to be credible professionals, We must concede what we don’t know, share what we do know and recognize that cooperation isn’t a hallmark of weakness but a harbinger of strength.  Bluffing is fine at the poker table, but it will kill you in Court.  Your word—your credibility—your reputation for honesty is worth more than all your education and skill.

And a variant on number one is:

Tell the truth, no matter the consequences.

I’ve written hundreds of articles about e-discovery and forensics.  Colleagues ask, “Aren’t you afraid something you wrote will be used against you in cross-examination?”  I tell them I’ve never worried about that because I’ve told the truth as I knew it in everything I wrote.  Sometimes I was mistaken, but I was never false.  So, I don’t have to remember what I said.  I just hold on to what I know to be true.

If someone wants to cite me to impeach me, bring it on!  I’ll take them from punched cards to magnetic media to solid state storage, from big iron mainframes to client-server to the Cloud.  I’ll share my conviction that learning never ends, and, yes, mistakes happen along the way.  The measure that matters is how we own our errors.  If we stick to the truth, we can gain more from failure than success.

My second precept is just one word.  A century ago, IBM’s founder Thomas J. Watson put a word on an easel at a business meeting.  It read “THINK.”  That’s still IBM’s slogan, and it’s what I want to shout at lawyers who serve ridiculous requests for production or file boilerplate objections.

THINK!  I want to stamp it on the foreheads of lawyers who just don’t think about where evidence is likely to be found or sensible ways to find it.  I know lawyers to be first-class thinkers; so, it’s maddening when good lawyers take off their thinking caps in e-discovery.  Any lawyer can learn enough tech to master the “E” in e-discovery.  Anyone.  All of us on this conference faculty are convinced of it.  It’s what gets us out of bed each day.

But to do it, lawyers must cast aside doubt and turn off the parts of their brains that tell them they’re too old, too busy or just too much a lawyer to learn something new.  Conferences like this one help—thank you for being here–but it takes more than a few hours on Zoom or a big litigation budget to become competent to serve your clients in the realm of electronic evidence.  It requires a willingness to fearlessly embrace an unfamiliar discipline–to learn a second thing. 

It takes a commitment to study, question, pursue and explore information about information and a commitment to THINK, THINK, THINK about how people communicate, what tools and software they use, their language, what metadata matters and where data lives. So, please don’t think you can’t learn it, or worse, that you need not learn it.  You can and you must.  Nothing less than the future of the civil justice system depends on it.  Of course, you’re here pursuing greater expertise, so I’m preaching to the choir.

My third lesson is: Have a plan.

In my thirties, I read Robert Caro’s epic biography of Robert Moses.  Moses was an urban planner who reshaped New York.  Robert Moses’ massive projects got built.  The secret to his success was that, where others came to planning meetings with ideas, Robert Moses arrived with blueprints and budgets.  He was a man with a plan.

In e-discovery, lawyers are brilliant at articulating objections, at saying what clients won’t do; but what’s often missing is a well-reasoned plan for what clients will do.  When you come with a plan, it’s clear you thought about what must be done.  A practical plan demonstrates a commitment to progress.  A reasonable plan forces the other side to work within your framework.  Judges love it when lawyers have a plan.  The Rules of discovery are written to better serve litigants with a plan. 

The e-discovery plan is a protocol. E-discovery demands a good protocol and success in e-discovery requires that lawyers know which features of a protocol are crucial and which are negotiable.  So, always show up with a plan.

Number four is: Never attribute to guile that which can be explained by incompetence.

I borrowed and adapted this one from my late friend, Browning Marean, who had a huge store of wise sayings.  In fairness, Browning borrowed it from Robert Heinlein, whose “Heinlein’s Razor” reads, “never attribute to malice that which is adequately explained by stupidity.”  Because we know lawyers aren’t stupid, I prefer to term it a shortfall in competence.

When a party messes up in e-discovery, the victims of failure often cry “foul” and suspect an intent to deprive them of the evidence.  In my experience, intent,–I’m calling it “guile,”– when it’s genuine, tends to manifest as efforts to conceal the screw up—it’s the cover up that kills you, not the failure itself.  Most screw ups are just… screw ups.  Always avoidable, sometimes reprehensible, but more often the result of apathy than antipathy. Maybe that’s why the last set of Rules amendments shielded parties from serious sanctions for mere incompetence.  In my mind, the decision to tie judges’ hands when disciplining incompetence and spoliation was a poor one.  A mistakenly political one.  Fear of sanctions was the prime driver of the e-discovery revolution.  It was the reason lawyers and companies came around and started preserving and producing ESI.  Sanctions were the driver of competence.  Sadly, we never had much of a carrot, and now they’ve taken away the stick.

And my final lesson is one of human nature:

Remember that Courts guard their authority more scrupulously than your client’s rights.

What I mean by this, with no disrespect to the judges on this faculty or listening, is a recognition born of experience, that a party is considerably more likely to be disciplined for violating a court order than for failing to fulfill obligations to an opponent. The takeaway is that an effort to secure sanctions is a marathon, not a sprint. 

You must take the time and make the effort to mature motions to compel or for protection into explicit orders of the Court.  A court cannot function if its orders are ignored with impunity.  So, if sanctions are your objective, position the failure to produce to be more than simply a transgression of your client’s rights, put it in the posture of something that threatens the court’s sovereignty.

And while we’re talking sanctions, never forget that sanctions are exceptional remedies.  Courts hate to sanction parties or counsel.  Though the threat of sanctions carried along in a case can be a useful tactic, seeking sanctions to patch a weak case is a fool’s errand.  Discovery is a mechanism to gather evidence to make your case, nothing more or less than that.

Those are my five.  I expect you have some great ones of your own.  Next year, I’d like to hear you sharing yours right here.  Even better, let’s all meet in Gainesville. Share our ideas.  Break bread and toast a return to normalcy.   That’s an invitation to join the e-discovery community.  Being part of it has been one of the great delights of my professional life. I’ve made wonderful friends that way.  You will, too.

Be well and thank you.

Can a Producing Party Refuse to Produce Linked Attachments to E-Mail?

A fellow professor of e-discovery started my morning with a question. He wrote, “In companies using Google business, internal email ‘attachments’ are often linked with a URL to documents on a Google drive rather than actually ‘attached’ to the email…. Can the producing party legally refuse to produce the document as an attachment to the email showing the family? Other links in the email to, for example, a website need not be produced.

I replied that I didn’t have the definitive answer, but I had a considered opinion. First, I challenged the assertion, “Other links in the email to, for example, a website need not be produced.”

Typically, the link must be produced because it’s part of the relevant and responsive, non-privileged message.  But, the link is just a pointer, a path to an item, and the discoverability of the link’s target hinges upon whether the (non-privileged) target is responsive AND within the care, custody or subject to the control of the producing party.

For the hypothetical case, I assume that the transmittal is deemed relevant and the linked targets are either relevant by virtue of their being linked to the transmittal or independently relevant and responsive.  I also assume that the linked target remains in the care, custody or subject to the control of the producing party because it has a legal and practical right of access to the repository where the linked target resides; that is, the producing party CAN access the linked item, even if they would rather not retrieve the relevant, responsive and non-privileged content to which the custodian has linked the transmittal.

If the link is not broken and the custodian of the message could click the link and access the linked target, where is the undue burden and cost?  Certainly I well know that collection is often delegated to persons other than the custodian, but shouldn’t we measure undue burden and cost from the standpoint of the custodian under the legal duty to preserve and produce, NOT from the perspective of a proxy engaged to collect, but lacking the custodian’s ability to collect, the linked target? Viewed in this light, I don’t see where the law excuses the producing party from collecting and producing the linked target

The difficulty in collection cited results from the producing party contracting to delegate storage to a third-party Cloud Provider, linking to information relegated to the Cloud Provider’s custody.  In certain respects, it’s like the defendant in Columbia Pictures v. Bunnell, who put a contractor (Panther) in control of the IP addresses of the persons trading pirated movies via the defendant’s platform.  Just because you enlist someone to keep your data on your behalf doesn’t defeat your ultimate right of control or your duty of production. 

Having addressed duty, let’s turn to feasibility, which is really what the fight’s about. 

Two key issues I see are: 

1. What if the link is broken by the passage of time?  If the target cannot be collected after reasonable efforts to do so, then it may be infeasible to effect the collection via the link or via pairing the link address to the addresses of the contents of the repository (as by using the Download-Link-Generator tool I highlighted here).  If there is simply no way a link created before a legal hold duty attached can be tied to its target, then you can’t do it, and the Court can’t order the impossible.  But, you can’t just label something “impossible” because you’d rather not do it.  You must make reasonable efforts and you must prove infeasibility. Courts should look askance at claims of infeasibility asserted by producing parties who have created the very situations that make it harder to obtain discovery.

2. What if the content of the target has changed since the time it was linked?  This is where the debate gets stickier, and where I have little empathy for a producing party who expects to be excused from production on the basis that it altered the evidence.  If the evidence has changed to the point where its relevance is in question because it may have been materially changed after linking, then the burden to prove the material change (and diminished relevance) falls on the producing party, not the requesting party.  Else, you take your evidence as you find it, and you produce it as it exists at the time of preservation and collection.  The possibility that it changed goes to its admissibility and weight, not to its discoverability.

I hope you agree my analysis is sound. To paraphrase Abraham Lincoln, you cannot murder your parents and then seek leniency because you’re an orphan. The problem is solvable, but it will be resolved only when Courts supply the necessary incentive by ordering collection and production. Integrating a hash value of the target within the link might go a long way to curing this Humpty-Dumpty dilemma; then, the target can be readily identified AND proven to be in the same state as when the link was created.

While we are at it, embedded links should be addressed from the standpoint of security and ethics. If a producing party supplies a message or document with a live link and opposing counsel’s clicks on the link exposing information not meant to be produced, whose head should roll there? If a party produces a live link in an email, is it reasonable to assume that the target was delivered, too? To my mind, the link is fair game, just as the attachment would be had it been embedded in the message. Electronic delivery is delivery. We have rules governing inadvertent production of privileged content, but not for the scenario described.

Don’t BE a Tool, GET a Tool!

Considering the billions of dollars spent on e-discovery every year, wouldn’t you think every trial lawyer would have some sort of e-discovery platform?  Granted, the largest firms have tools; in fact, e-discovery software provider Relativity (lately valued at $3.6 billion) claims 198 of the 200 largest U.S. law firms as its customers.  But, for the smaller firms and solo practitioners who account for 80% or more of lawyers in private practice, access to e-discovery tools falls off.  Off a cliff, that is. 

When law firms or solos seek my help obtaining native production, my first question is often, “what platform are you using?”  Their answer is usually “PC” or simply a blank stare.  When I add, “your e-discovery platform–the software tool you’ll use to review and search electronically stored information,” the dead air makes clear they haven’t a clue.  I might as well ask a dog where it will drive if it catches the car.    

Let’s be clear: no lawyer should expect to complete an ESI review of native forms using native applications

Don’t do it.

I don’t care how many regale me with tales of their triumphs using Outlook or Microsoft Word as ‘review tools.’  That’s not how it’s done.  It’s reckless.  The integrity of electronic evidence will be compromised by that workflow.  You will change hash values.  You will alter metadata.  Your searches will be spotty.  Worst case scenario: your copy of Outlook could start spewing read receipts and calendar reminders.  I dare you to dig your way out of that with a smile.  Apart from the risks, review will be slow.  You won’t be able to tag or categorize data.   When you print messages, they’ll bear your name instead of the custodian’s name. Doh!

None of this is an argument against native production. 
It’s an argument against incompetence. 

I am as dedicated a proponent of native production as you’ll find; but to reap the benefits and huge cost savings of native production, you must use purpose-built review tools.  Notwithstanding your best efforts to air gap computers and use working copies, something will fail.  Just don’t do it.

You’ll also want to use an e-discovery review tool because nothing else will serve to graft the contents of load files onto native evidence.  For the uninitiated, load files are ancillary, delimited text files supplied with a production and used to carry information about the items produced and the layout of the production.

I know some claim that native productions do away with the need for load files, and I concede there are ways to structure native productions to convey some of the data we now exchange via load files.  But why bother?  After years in the trenches, I’ve given up cursing the use of load files in native, hybrid and TIFF+ productions.  Load files are clunky, but they’re a proven way to transmit filenames and paths, supply Bates numbers, track duplicates, share hash values, flag family relationships, identify custodians and convey system metadata (that’s the kind not stored in files but residing in the host system’s file table).   Until there’s a better mousetrap, we’re stuck with load files.

The takeaway is get a tool. If you’re new to e-discovery, you need to decide what e-discovery tool you will use to review ESI and integrate load files.  Certainly, no producing party can expect to get by without proper tools to process, cull, index, deduplicate, search, review, tag and export electronic evidence—and to generate load files.  But requesting parties, too, are well-served to settle on an e-discovery platform before they serve their first Request for Production.  Knowing the review tool you’ll use informs the whole process, particularly when specifying the forms of production and the composition of load files.  Knowing the tool also impacts the keywords used in and structure of search queries.

There are a ton of tools out there, and one or two might not skin you alive on price. Kick some tires. Ask for a test drive. Shop around. Do the math. But, figure out what you’re going to do before you catch that car. Oh, and don’t even THINK about using Outlook and Word. I mean it. I’ve got my eye on you, McFly.

Understanding the UPC: Because You Can

Where does the average person encounter binary data?  Though we daily confront a deluge of digital information, it’s all slickly packaged to spare us the bare binary bones of modern information technology.  All, that is, save the humble Universal Product Code, the bar code symbology on every packaged product we purchase from a 70-inch TV to a box of Pop Tarts.  Bar codes and their smarter Japanese cousins, QR Codes, are perhaps the most unvarnished example of binary encoding in our lives. 

Barcodes have an ancient tie to e-discovery as they were once used to Bates label hard copy documents, linking them to “objective coding” databases. A lawyer using barcoded documents was pretty hot stuff back in the day.

Just a dozen numeric characters are encoded by the ninety-five stripes of a UPC-A barcode, but those digits are encoded so ingeniously as to make them error resistant and virtually tamperproof. The black and white stripes of a UPC are the ones and zeroes of binary encoding.  Each number is encoded as seven bars and spaces (12×7=84 bars and spaces) and an additional eleven bars and spaces denote start, middle and end of the UPC.  The start and end markers are each encoded as bar-space-bar and the middle is always space-bar-space-bar-space.  Numbers in a bar code are encoded by the width of the bar or space, from one to four units. 

This image has an empty alt attribute; its file name is barcode-water.png

The bottle of Great Value purified water beside me sports the bar code at right.

Humans can read the numbers along the bottom, but the checkout scanner cannot; the scanner reads the bars. Before we delve into what the numbers signify in the transaction, let’s probe how the barcode embodies the numbers.  Here, I describe a bar code format called UPC-A.  It’s a one-dimensional code because it’s read across.  Other bar codes (e.g., QR codes) are two-dimensional codes and store more information because they use a matrix that’s read side-to-side and top-to-bottom.

The first two black bars on each end of the barcode signal the start and end of the sequence (bar-space-bar).  They also serve to establish the baseline width of a single bar to serve as a touchstone for measurement.  Bar codes must be scalable for different packaging, so the ability to change the size of the codes hinges on the ability to establish the scale of a single bar before reading the code.

Each of the ten decimal digits of the UPC are encoded using seven “bar width” units per the schema in the table at right.

To convey the decimal string 078742, the encoded sequence is 3211 1312 1213 1312 1132 2122 where each number in the encoding is the width of the bars or spaces.  So, for the leading value “zero,” the number is encoded as seven consecutive units divided into bars of varying widths: a bar three units wide, then (denoted by the change in color from white to black or vice-versa), a bar two units wide, then one then one.  Do you see it? Once more, left-to-right, a white band, three units wide, a dark band two units wide , then a single white band and a single dark band (3-2-1-1 encoding the decimal value zero).

You could recast the encoding in ones and zeroes, where a black bar is a one and a white bar a zero. If you did, the first digit would be 0001101, the number seven would be 0111011 and so on; but there’s no need for that, because the bands of light and dark are far easier to read with a beam of light than a string of printed characters.

Taking a closer look at the first six digits of my water bottle’s UPC, I’ve superimposed the widths and corresponding decimal value for each group of seven units. The top is my idealized representation of the encoding and the bottom is taken from a photograph of the label:

Now that you know how the bars encode the numbers, let’s turn to what the twelve digits mean.  The first six digits generally denote the product manufacturer. 078742 is Walmart. 038000 is assigned to Kellogg’s.  Apple is 885909 and Starbucks is 099555.  The first digit can define the operation of the code.  For example, when the first digit is a 5, it signifies a coupon and ties the coupon to the purchase required for its use.  If the first digit is a 2, then the item is something sold by weight, like meats, fruit or vegetables, and the last six digits reflect the weight or price per pound.  If the first digit is a 3, the item is a pharmaceutical.

Following the leftmost six-digit manufacturer code is the middle marker (1111, as space-bar-space-bar-space) followed by five digits identifying the product.  Every size, color and combo demands a unique identifier to obtain accurate pricing and an up-to-date inventory.

The last digit in the UPC serves as an error-correcting check digit to ensure the code has been read correctly.  The check digit derives from a calculation performed on the other digits, such that if any digit is altered the check digit won’t match the changed sequence. Forget about altering a UPC with a black marker: the change wouldn’t work out to the same check digit, so the scanner will reject it.

In case you’re wondering, the first product to be scanned at a checkout counter using a bar code was a fifty stick pack of Juicy Fruit gum in Troy, Ohio on June 26, 1974.  It rang up for sixty-seven cents.  Today, 45 sticks will set you back $2.48 (UPC 22000109989).

Don’t Bet the Farm on Slack Space

A depiction of file slack from Ball, E-Discovery Workbook © 2020

A federal court appointed me Special Master, tasked to, in part, search the file slack space of a party’s computers and storage devices.  The assignment prompted me to reconsider the value of this once-important forensic artifact.

Slack space is the area between the end of a stored file and the end of its concluding cluster: the difference between a file’s logical and physical size. It’s wasted space from the standpoint of the computer’s file system, but it has forensic significance by virtue of its potential to hold remnants of data previously stored there.  Slack space is often confused with unallocated clusters or  free space, terms describing areas of a drive not currently used for file storage (i.e., not allocated to a file) but which retain previously stored, deleted files. 

A key distinction between unallocated clusters and slack space is that unallocated clusters can hold the complete contents of a deleted file whereas slack space cannot.  Data recovered (“carved”) from unallocated clusters can be quite large—spanning thousands of clusters—where data recovered from a stored file’s slack space can never be larger than one cluster minus one byte.  Crucially, unallocated clusters often retain a deleted file’s binary header signature serving to identify the file type and reveal the proper way to decode the data, whereas binary header signatures in slack space are typically overwritten.

A little more background in file storage may prove useful before I describe the dwindling value of slack space in forensics.

Electronic storage media are physically subdivided into millions, billions or trillions of sectors of fixed storage capacity.  Historically, disk sectors on electromagnetic hard drives were 512 bytes  in size.  Today, sectors may be much larger (e.g., 4,096 bytes).  A sector is the smallest physical storage unit on a disk drive, but not the smallest accessible storage unit.  That distinction belongs to a larger unit called the cluster, a logical grouping of sectors and the smallest storage unit a computer can read from or write to.  On Windows machines, clusters are 4,096 bytes (4kb) by default for drives up to 16 terabytes.  So, when a computer stores or retrieves data, it must do so in four kilobyte clusters.

File storage entails allocation of enough whole clusters to hold a file.  Thus, a 2kb file will only fill half a 4kb cluster–the balance being slack space.  A 13kb file will tie up four clusters, although just a fraction of the final, fourth cluster is occupied is occupied by the file.  The balance is slack space and it could hold fragments of whatever was stored there before.  Because it’s rare for files to be perfectly divisible by 4 kilobytes and many files stored are tiny, much drive space is lost to slack space.  Using smaller clusters would mean less slack space, but any efficiencies gained would come at the cost of unwieldy file tracking and retrieval.

So, slack space holds forensic artifacts and those artifacts tend to hang around a long time.  Unallocated clusters may be called into service at any time and their legacy content overwritten.  But data lodged in slack space endures until the file allocated to the cluster is deleted–on conventional “spinning” hard drives at any rate.

When I started studying computer forensics in the MS-DOS era, slack space loomed large as a source of forensic intelligence.  Yet, apart from training exercises where something was always hidden in slack, I can’t recall a matter I’ve investigated this century which turned on evidence found in slack space.  The potential is there, so when it makes sense to do it, examiners search slack using unique phrases unlikely to throw off countless false positives.

But how often does it make sense to search slack nowadays?

I’ve lately grappled with that question because it seems to me that the shopworn notions respecting slack space must be re-calibrated.  

Keep in mind that slack space holds just a shard of data with its leading bytes overwritten.  It may be overwritten minimally or overwritten extensively, but some part is obliterated, always.  Too, slack space may hold the remnants of multiple deleted files; that is, as overlapping artifacts: files written, deleted overwritten by new data, deleted again, then overwritten again (just less extensively so).  Slack can be a real mess.

Fifteen years ago, when programs stored text in ASCII (i.e., encoded using the American Standard Code for Information Interchange or simply “plain text”), you could find intelligible snippets in slack space.  But since 2007, when Microsoft changed the format of Office productivity files like Word, PowerPoint and Excel files to Zip-compressed XML formats, there’s been a sea change in how Office applications and other programs store text.  Today, if a forensic examiner looks at a Microsoft Office file as it’s written on the media, the content is compressed.  You won’t see any plain text.  The file’s contents resemble encrypted data.  The “PK” binary header signature identifying it as compressed content is gone, so how will you recognize zipped content?  What’s more, the parts of the Zip file required to decompress the snippet have likely been obliterated, too. How do you decode fragments if you don’t know the file type or the encoding schema?

The best answer I have is you throw common encodings against the slack and hope something matches up with the search terms.  More-and-more, nothing matches, even when what you seek really is in the slack space. Searches fail because the data’s encoded and invisible to the search tool.  I don’t know how searching slack stacks up against the odds of winning the lottery, but a lottery ticket is cheap; a forensic examiner’s time isn’t.

That’s just the software.  Storage hardware has evolved, too.  Drives are routinely encrypted, and some oddball encryption methods make it difficult or impossible to explore the contents of file slack.  The ultimate nail in the coffin for slack space will be solid state storage devices and features, like wear leveling and TRIM that routinely reposition data and promise to relegate slack space and unallocated clusters to the digital dung heap of history.

Taking a fresh look at file slack persuades me that it still belongs in a forensic examiner’s bag of tricks when it can be accomplished programmatically and with little associated cost.  But, before an expert characterizes it as essential or a requesting party offers it as primary justification for an independent forensic examination, I’d urge the parties and the Court to weigh cost versus benefit; that is, to undertake a proportionality analysis in the argot of electronic discovery.  Where searching slack space was once a go-to for forensic examination, it’s an also-ran now. Do it, when it’s an incidental feature of a thoughtfully composed examination protocol; but don’t bet the farm on finding the smoking gun because the old gray mare, she ain’t what she used to be!
See? I never metaphor I didn’t like.

******************************

Postscript: A question came up elsewhere about solid state drive forensics. Here was my reply:

The paradigm-changing issue with SSD forensic analysis versus conventional magnetic hard drives is the relentless movement of data by wear leveling protocols and a fundamentally different data storage mechanism. Solid state cells have a finite life measured in the number of write-rewrite cycles.

To extend their useful life, solid state drives move data around to insure that all cells are written with roughly equal frequency. This is called “wear leveling,” and it works. A consequence of wear leveling is that unallocated cells are constantly being overwritten, so SSDs do not retain deleted data as electromagnetic drives do. Wear leveling (and the requisite remapping of data) is handled by an SSD drive’s onboard electronics and isn’t something users or the operating system control or access.

Another technology, an ATA command called TRIM, is controllable by the operating system and serves to optimize drive performance by disposing of the contents of storage cell groups called “pages” that are no longer in use. Oversimplified, it’s faster to write to an empty memory page than to initiate an erasure first; so, TRIM speeds the write process by clearing contents before they are needed, in contrast to an electromagnetic hard drive which overwrites clusters without need to clear contents beforehand.

The upshot is that resurrecting deleted files by identifying their binary file signatures and “carving” their remnant contents from unallocated clusters isn’t feasible on SSD media. Don’t confuse this with forensically-sound preservation and collection. You can still image a solid state drive, but you’re not going to get unallocated clusters. Too, you won’t be interfacing with the physical media grabbing a bitstream image. Everything is mediated by the drive electronics.

******************************

Dear Reader, Sorry I’ve been remiss in posting here during the COVID crisis. I am healthy, happy and cherishing the peace and quiet of the pause, hunkered down in my circa-1880 double shotgun home in New Orleans, enjoying my own cooking far too much. Thanks to Zoom, I completed my Spring Digital Evidence class at the University of Texas School of Law, so now one day just bubbles into the next, and I’m left wondering, Where did the day go?. Every event where I was scheduled to speak or teach cratered, with no face-to-face events sensibly in sight for 2020. One possible exception: I’ve just joined the faculty of the Tulane School of Law ten minutes upriver for the Fall semester, and plan to be back in Austin teaching in the Spring. But, who knows, right? Man plans and gods laugh.

We of a certain age may all be Zooming and distancing for many months. As one who’s bounced around the world peripatetically for decades, not being constantly on airplanes and in hotels is strange…and stress-relieving. While I miss family, friends and colleagues and mourn the suffering others are enduring, I’ve benefited from the reboot, ticking off household projects and kicking the tires on a less-driven day-to-day. It hasn’t hurt that it’s been the best two months of good weather I’ve ever seen, here or anywhere. The prospect of no world travel this summer–and no break from the soon-to-be balmy Big Easy heat–is disheartening, but small potatoes in the larger scheme of things.

Be well, be safe, be kind to yourself. This, too, shall pass and as my personal theme song says, There's a Great Big Beautiful Tomorrow. Just a Dream Away.

Don’t Let Plaintiffs’ Lawyers Read This!!

Be honest.  Wouldn’t you love to stick it to the plaintiffs?  Wouldn’t your corporate client or carrier be ecstatic if you could make litigation much more expensive for those greedy opportunists bringing frivolous suits and demanding discovery?  What if you could make discovery not just more costly, but make it, say, five times more costly, ten times more costly, than it is for you?  Really bring the pain.  Would you do it?

Now that I have your attention–and the attention of plaintiffs’ counsel wondering if they’ve stumbled into a closed meeting at a corporate counsel retreat—I want to show you this is real.  Not just because I say so, but because you prove it to yourself.  You do the math.

Math!  You didn’t say there would be math!

Stop.  You know you’re good at math when the numbers come with dollar signs.  Legendary Texas trial lawyer W. James Kronzer used to say to me, “I’m no good at math, Herman; but I can divide any number by three.”  That was back when a third was the customary contingent fee.

Even after you do the math, you’re not going to believe it; instead, you’ll conclude it can’t be true.  Surely nothing so unjust could have escaped my notice.  Why would Courts allow this?  How can I be such a sap?

The real question is this: What am I going to do about it? Continue reading

Dig We Must: Get It in Writing

This isn’t a post about e-discovery per se, but it bears on process and integrity issues we face in cooperating to craft e-discovery expectations.  Still, it’s more parable than parallel.

My home in New Orleans sits at the intersection of two narrow streets built for horse and mule traffic.  It’s held its corner ground since 1881, serving as abattoir, ancestral home of a friend and now, my foot on the ground in the Big Easy.  New Orleanians are the friendliest folks.  You can strike up a spirited tête-à-tête with anyone since everyone has something to say about food, festivals, Saints football, Mardi Gras, the Sewage and Water Board and the gross ineptitude of local government in its abject failure to deliver streets and sidewalks that don’t swallow you whole or otherwise conspire to kill or maim the populace.

That’s not to say the City does nothing in the way of maintaining infrastructure.  Right now, New Orleans is replacing its low-pressure gas lines with high pressure lines.  Gas is a big deal where everyone eats red beans on Mondays, but it’s also useful for heating and, even now—still—for lighting.  So, every street must have new subterranean lines installed and new risers brought to gas meters.  I knew nothing of this until I awoke to find a crew with an excavator on my property destroying the curbs and antique brick sidewalks I’d lately installed at considerable expense. Continue reading

Cryptographic Hashing: “Exceptionally” Deep in the Weeds

We all need certainty in our lives; we need to trust that two and two is four today and will be tomorrow.  But the more we learn about any subject, the more we’re exposed to the qualifiers and exceptions that belie perfect certainty.  It’s a conundrum for me when someone writes about cryptographic hashing, the magical math that allows an infinite range of numbers to match to a finite complement of digital fingerprints. Trying to simplify matters, well-meaning authors say things about hashing that just aren’t so.  Their mistakes are inconsequential for the most part—what they say is true enough–but it’s also misleading enough to warrant caveats useful in cross-examination.

I’m speaking of the following two assertions:

  1. Hash values are unique; i.e., two different files never share a hash value.
  2. Hash values are irreversible, i.e., you can’t deduce the original message using its hash value.

Both statements are wrong. Continue reading

Cryptographic Hashing: A Deeper Dive

It’s October (already?!?!) and–YIKES–I haven’t posted for two weeks.  I’m tapping away on a primer about e-discovery processing, a topic that’s received scant attention…ever.  One could be forgiven for thinking the legal profession doesn’t care what happens to all that lovely data when it goes off to be processed!  Yet, I know some readers share my passion for ESI and adore delving deeply into the depths of data processing.  So, here are a few paragraphs pulled from my draft addressing the well-worn topic of hashing in e-discovery where I attempt a foolhardy tilt at the competence windmill and seek to explain how hashing works and what those nutty numbers mean.  Be warned, me hearties, there be math ahead!  It’s still a draft, so feel free to push back and all criticism (constructive/destructive/dismissive) warmly welcomed.

My students at the  University of Texas School of Law and the Georgetown E-Discovery Training Academy spend considerable time learning that all ESI is just a bunch of numbers.  They muddle through readings and exercises about Base2 (binary), Base10 (decimal), Base16 (hexadecimal) and Base64; as well as about the difference between single-byte encoding schemes (ASCIII) and double-byte encoding schemes (Unicode).  It may seem like a wonky walk in the weeds; but the time is well spent when the students snap to the crucial connection between numeric encoding and our ability to use math to cull, filter and cluster data.  It’s a necessary precursor to their gaining Proustian “new eyes” for ESI.

Because ESI is just a bunch of numbers, we can use algorithms (mathematical formulas) to distill and compare those numbers.  Every student of electronic discovery learns about cryptographic hash functions and their usefulness as tools to digitally fingerprint files in support of identification, authentication, exclusion and deduplication.  When I teach law students about hashing, I tell them that hash functions are published, standard mathematical algorithms into which we input digital data of arbitrary size and the hash algorithm spits out a bit string (again, just a sequence of numbers) of fixed length called a “hash value.”  Hash values almost exclusively correspond to the digital data fed into the algorithm (termed “the message”) such that the chance of two different messages sharing the same hash value (called a “hash collision”) is exceptionally remote.  But because it’s possible, we can’t say each hash value is truly “unique.”

Using hash algorithms, any volume of data—from the tiniest file to the contents of entire hard drives and beyond—can be almost uniquely expressed as an alphanumeric sequence; in the case of the MD5 hash function, distilled to a value written as 32 hexadecimal characters (0-9 and A-F).  It’s hard to understand until you’ve figured out Base16; but, those 32 characters represent 340 trillion, trillion, trillion different possible values (2128 or 1632). Continue reading