What’s in a Name (or Hash Value)?

A question common to investigation of alleged data theft is, “Are any of our stolen files on our competitor’s systems?”  Forensic examiners track purloined IP using several strategies: among them, searching for matching filenames, hash values, metadata and content.  Any of these can be altered by data thieves seeking to cover their tracks, but most are too confident or too dim to bother.

A current matter underscored the pitfalls of filename and hash searches, prompting me to reflect on a long-ago case where hash searches caused headaches.  The old case stemmed from a settlement of a data theft event requiring a periodic audit of hashes of the defendant’s data to ensure that stolen data hasn’t re-emerged.  The plaintiff sought sanctions because its expert found hash values in the audit that matched hashes tied to stolen PowerPoint presentations.  The defendants were dumbfounded, certain they’d adhered to the settlement and not used any purloined PowerPoints. 

When I stepped in, I confirmed there were matching hash values, but none matched the PowerPoint PPT and PPTX files of interest.  Instead, the hashes matched only benign component image data within the presentations.  The components hashed were standard slide backgrounds (e.g., “woodgrain”) found in any copy of PowerPoint.  Both parties possessed PowerPoints using some of the same generic design elements, but none were the same presentations.  The hashing tool so thoroughly explored the files that embedded images were hashed separately from the files in which they were used and matched other generic elements in other presentations.  No threat at all!

Still other matching files turned out to be articles freely distributed at an industry trade show and zero-byte “null” files that would match any similarly empty files on any machine.  When every hash match was scrutinized, none proved to be stolen data.  Away went the sanctions motion.

The moral of the story is, although it’s extremely unlikely that two different files will share the same hash value, matching hash values don’t always signify the “same” file in practical terms.  Matching files may derive from independent sources, could be benign components of compilations or might match because they hold little or no content.  The math is powerful, but it mustn’t displace common sense.

In the ongoing matter, a simple method used to identify contraband data was filename matching.  The requesting party sought to identify instances of a file called “Book3.xlsx;” and the search turned up hundreds of instances of identically named files in the producing party’s data–though not a single one hash-matched the file of interest.

Why so many false positives?  It turns out Microsoft Excel assigns an incremented name to any new spreadsheet (despite earlier-opened sheets having been closed) so long as even one prior sheet remains open.  So, if you’ve created eight Excel spreadsheets, renamed them and closed all but one, the next new sheet will be named Book9.xlsx by default.  The name “Book3.xlsx” signified only that two prior spreadsheets had been opened.  The takeaway is that, in any large collection, expect to turn up instances of various Book(n).xlsx files created when a user exited and saved a sheet without renaming it from its default name.

Electronic search—by hash, filename, metadata or keyword–is an invaluable tool in investigation and e-discovery; but one best used with a modicum of common sense by those who appreciate its limitations.

C’mon! Bates Numbering Native Production is Easy!

Sometimes, the other side balks at a proposed e-discovery protocol, arguing it’s unduly burdensome to rename native files to their Bates numbers. I find that odd because parties have always named files for Bates numbers whilst doing clunky TIFF productions.  Where did they think the names of all those TIFF images came from?  The truth is, litigants have been naming files to match Bates numbers for as long as we’ve done e-discovery!  It’s easy!

It’s one thing to say something is easy and another to prove its simplicity.  Certainly, if you use an e-discovery vendor, it’s as easy as saying, “Bates number the native files.”  They know what to do. But anyone doing electronic production in-house can add Bates numbers to filenames simply, quickly and cheaply. 

There are various ways to do it.  You can prepend Bates number (Bates##_filename.ext), append Bates number (filename_Bates##.ext) or replace the filename with the Bates number, storing the original name in a load file.  You can even add protective language like “PRODUCED SUBJECT TO PROTECTIVE ORDER.”

Multiple free and low-cost bulk renaming tools are available.  I’ve long praised a powerful, flexible too called Bulk Renaming Utility. It’s free for personal use and $93 for commercial purposes; a powerful tool, but overwhelming to some.  Seeking a simpler tool and one free to use commercially, I found two: File Renamer Basic and Ant Renamer.  Both impressed me with their flexibility and ease of use. 

For Mac users, there’s a nice free tool called File Renamer for MacOS 64 bit, which I’ll also touch on below.

Let’s look at how to configure both Windows tools to Bates number a production.

Suppose the production protocol reads:

Bates Numbers. All Bates numbers will consist of a three-digit Alpha Prefix, followed immediately by an 8-digit numeric: AAA########. There must be no spaces in the Bates number. Any numbers with less than 8 digits will be front padded with zeros to reach the required 8 digits. ESI will be Bates numbered by substituting, prepending or appending the Bates number for/to the file name.

Assuming there have been ten other items produced earlier,, we must begin Bates numbering at DEF00000011.  For this tutorial, I’ll use just six photos of American coins, but it could as easily be thousands of files of any sort.  Here are thumbnails of the exemplar photos:

The table below lists the filenames and MD5 hash values of the files, allowing us to confirm that a renaming tool won’t otherwise alter the evidence.

Original NameTypeSizeMD5 Hash

To demonstrate, I placed working copies of all the files needing Bates numbers in a Desktop folder named Production photos 11-21-20.  Inside this folder, I made an empty subfolder called BATES NUMBERED PHOTOS.  You don’t have follow suit, but however you approach it, don’t work on the source evidence; instead, create and produce renamed working copies.

File Renamer Basic

After installing and kicking off the program, I set the following parameters:

  1. Configure the “Folder” and “Copy to” paths.
  2. Set the three-digit Alpha Prefix required by the Protocol (I used “DEF” for Defendants).
  3. Set Unique Parameter to “Numbers,” “Increment” by 1, mask with eight zeroes and “Start at 11” (the next unassigned Bates number).
  4. Set Separator to a single underscore.  [While the protocol neither requires nor prohibits adding a separator between the Bates number and filename, I like to add it for clarity]
  5. In the Filename settings box, check “Place Unique Parameter before Filename.”
  6. Click “Preview,” and if you’re happy with the preview, click “Apply.”

Running hash values against the renamed files, we see that renaming the files has not altered their hash values.

NameTypeSizeMD5 Hash

Ant Renamer

After installing and kicking off the program, I set the following parameters:

  1. Using “Add Folders,” navigate to and select the folder with the files to be renamed.
  2. Click F10 to launch the Options menu and, under the >Processing tab, check the box “Copy instead of Rename,” then click “OK.”
  3. Under “Actions,” select “Enumeration” and configure the mask as: DEF%num%_%name%%ext%
  4. Set “Start at:” to 11 and “Number of Digits” to 8.
  5. Click “Preview of Selected Files” and, if all seems well, click GO on the menu.

Note that these settings will create a Bates numbered set of duplicate files in the same folder as the source files, NOT in the subfolder.

Frankly, it’s harder to describe the task than to complete it. After a few minutes playing with the settings, you’ll easily figure out how to prepend a Bates number, append it or swap it for the original name. Once you’ve gotten the settings where you’d like them, File Renamer Basic allows you to save your custom settings as a profile and apply it to future productions.

I spent only a short time investigating The Mac application FileRenamer, but it was intuitive enough to use without any unmanly reading of directions and took just seconds to configure numbering and set a mask to finish the task. I configured numbering in Settings>Numbering (Initial value: 11, Increment: 1 and Fixed Length with Leading Zeroes: 8) then the mask to include the three-digit alpha prefix, padded numbering and underscore separator to precede the filename (DEF%num%_%name%).

Easy as pie! And while we’re on the subject of pie, HAPPY THANKSGIVING!

The Metadata Vanishes

I love solving puzzles.  I come by it honestly. My late mother was a nationally ranked New York Times crossword puzzler, and though I lack her prodigious gifts, I start each morning racing on the Times crossword.  I mention puzzling to note that the best part of my forensics work is finding the answer to electronic evidence puzzles.  This week’s challenge comes from a legal assistant caught between a rock and a hard place, actually between the plaintiff and defense counsel.  The defense objected that photos produced in discovery lacked metadata, while the plaintiff insisted the photos he had furnished contained the “missing” metadata.  How could they both be right?  The mystified legal assistant had simply saved the photos from the transmitting message and sent them on to the other side.  She hadn’t removed any metadata.  Or had she?

I had to figure out what happened and keep it from happening again.

First, some technical underpinnings: 

What do we mean by metadata?  Digital photos, particularly those taken with cell phone cameras, hold more information than shows up in the pretty pictures.  Stored within the photos is a type of application metadata called EXIF (for Exchangeable Image File Format). EXIF holds camera settings, including the make and model of the camera or phone, time and date information, geolocation coordinates and more.  Because it’s application metadata, it’s content stored within the file and moves with the file when copied or transmitted…unless someone or something makes it disappear.

There’s a second sort of metadata called system metadata, It’s context; data about the file that’s stored without the file, typically in the system’s file table that serves as a directory of electronically stored information.  System metadata includes such things as a file’s name, location, modified and created dates and more.  Because it’s stored outside a file, it doesn’t move with the file but must be rounded up when a file is copied or transmitted.  Precious little system metadata follows a file when it’s e-mailed, often just the file’s name, size and type (although Apple systems include the file’s last modified and created dates).

The defense was seeing dates and times for photos that did not line up with the actual dates and times the photos were taken.  Too, the camera and geolocation data that should have been in the EXIF segments of the pictures were gone when plaintiffs produced them.

Picture formats and EXIF metadata: The photos produced were taken with an iPhone and stored on a Mac computer.  When most of us think of digital photos, we probably think of JPEG images stored as files with the extension .JPG.  The JPEG photo format has been around for almost thirty years and been the most common format for much of that time.  JPEG is what’s termed “lossy compression” referring to its ability to make image files smaller in size by jettisoning parts of the image that contribute to resolution and detail.  The more tightly you compress a JPEG image (and the more often you do it), the “jaggier” and more distorted the image becomes.

As digital cameras have improved, digital photographs have grown larger in size, eating up storage space.  Two-thirds of the data on my iPhone are photographs.  Seeking a more efficient way to store images and video, Apple started phasing out JPEG images in 2017.  The replacement was a format called High Efficiency Image File Format which, as implemented by Apple, photos are stored as High-Efficiency Image Containers with the file extension .HEIC. 

The benefit is that, for comparable image quality, HEIC images are roughly half the size of JPEG images, and they hold EXIF data.  The downside is that most of the world still expects a picture to be a JPEG and the Windows and Cloud realms need time to catch up.  To remain compatible with other devices and operating systems, Apple converts HEIC images to JPEGs for sharing via e-mail.

Now, there’s something to consider!  Did Apple strip out the EXIF metadata from the HEIC photos when it converted them to JPEGs? Hold that thought while I lay a little more foundation.

Encoding in Base64: E-mail is one of the earliest Internet tools.  It hearkens back to an era when only the most basic alphabets could be transmitted using a venerable character encoding standard called ASCII (pronounced ASK-KEY and short for American Standard Code for Information Interchange).  How do you get binary data like photos to transit a system that only understands a 128-character alphabet?  Easy!  You convert the binary numbers to numbers expressed more efficiently as 64 ASCII characters, to wit, the 26 lowercase letters of the alphabet, the 26 uppercase letters, numbers zero through nine and two punctuation marks (forward slash/ and plus sign+).  That’s 64 characters, each representing a unique numeric value that can replace six bits of binary data.  So, 24 bits of data can be written using just four base64 characters. Base64 looks like this:

It may not look like much, but it’s a feat of reductive technology we all use every day.

Looking at our conversion events when metadata might be lost, we have:

  • HEIC to JPEG
  • JPEG to Base64
  • Base64 to JPEG

Coding in and out of Base64 shouldn’t change a thing, but we can’t rule out anything yet.

Is that all?  Nope!

Photos often change without acquiring a new format.  If you’ve attached a photo to an e-mail and were asked whether you want the attachment to be small, medium, large or original size, any choice but the last one effects big changes to content. Perhaps scaling a photo poses a risk that embedded EXIF metadata will be lost?

When the defense sought the missing metadata, the legal assistant went to the plaintiff, who supplied a screenshot showing that the HEIC photos he’d sent went out carrying the full complement of EXIF metadata.  I asked the legal assistant for a copy of what she’d produced to the defendant and confirmed the embedded EXIF data was, in fact, gone, gone, gone.

Coming back to “did Apple strip out the EXIF metadata from the HEIC photos when it converted them to JPEGs?”  I took an HEIC photo with my iPhone and e-mailed it to my Gmail account as an attachment.  The attachment was converted to a JPG but retained its EXIF data when saved to disk. I re-sent it as a downscaled image and all the EXIF remained intact.  Finally, I sent it as an inline image and saved the received image to disk.  Poof!  The metadata vanishes!  Now, we’re getting somewhere.

I asked the legal assistant to forward a copy of the e-mail she’d received from the client transmitting the photos.  As expected, the photos weren’t in HEIC format but had been converted to JPEGs.  Notably, they were inline photos displayed in the body of the e-mail instead of as attachments.  When I saved the inline images to disk, the EXIF data was gone.

Undeterred, I saved the forwarded message to disk as an .eml message and opened it in Microsoft Notepad.  Scrolling down to check the Base64 encoded content, I copied the Base64 of a single image and converted it to a JPEG photo.  Happily, the photo I recovered held its full complement of EXIF data.  I could only conclude that saving an inline photo to disk by right clicking and choosing “Save Image as” was the culprit.  Had the photos been made attachments instead of inline images, their EXIF data would have remained in the file saved to disk.

But the revelation was that the EXIF data sought was present in the JPEG images, even if it couldn’t be pulled out by clicking on them as inline images and saving the image to disk.  This was true in both Gmail and Outlook.

Now, I have a forensics lab thrumming with workstations and ingenious software, but what’s a legal assistant supposed to do, MacGyver-like, with just the tools at hand?  Having solved the puzzle of what went wrong, the bonus puzzle was figuring out how to fix it.

Here’s a simple workaround I came up with that performed splendidly:

1. Create an empty folder on your Windows Desktop called “Inline Images.”

2. In Microsoft Outlook, open the message holding the inline photos you want to extract. 

3. From the Outlook message menu bar select File>Save As then chose Save as Type>HTML (*.htm, *.html) and save the message to your “Inline Images” folder.

4. Open the “Inline Images” folder and locate the subfolder named [subject of the transmitting message]_Files.  Open this folder and you’ll find copies of each inline photo.  If you find two copies of each, small and large, the small copy is a thumbnail lacking EXIF data but the full-size version will have all EXIF metadata intact.  Voila!  We go from The Metadata Vanishes to Return of the Metadata.

I’d prefer clients e-mail photos by transmitting them inside a compressed Zip file rather than forwarding them as inline images or attachments.  The Zip container better protects the integrity of the evidence and forestalls stripping or alteration of metadata.  Plus, a Zip container can be encrypted for superior cybersecurity. 

Have you run into this before, Dear Reader?  Do you know a simpler way to get inline images out of parent messages without corrupting metadata or hiring an expert?  If so, please leave a comment.

Gayle McCormick O’Connor

I’m sad to share that Gayle McCormick O’Connor died suddenly in her sleep on Sunday, October 18, 2020.  Gayle was a longtime legal technology marketing specialist. If you hadn’t the luck to know her, you doublessly noticed her at any of dozens of legal tech conferences.  Gayle was a star, truly a nova who shone brightly and cared passionately about her colleagues, friends and, above all, her son, Seamus and husband, Tom O’Connor–e-discovery and LegalTech thought leader and my dear friend.

Gayle’s luminosity was no accident.  She worked at it.  Gayle turned heads.  Gayle danced in front of the band.  She dressed to the nines in footwear that defied gravity and description.  Gayle was a canvas for her art and her art was celebration.  I’ve seen Gayle spin atop the bar in a packed Vegas biker bar.  I’ve sung Viva Las Vegas with Tom and Gayle at the top of our lungs beside the Bellagio fountains.  Just a month ago, Gayle sang Happy Birthday to me in her unforgettable Marilyn Monroe-Seducing-John-Kennedy style. Gayle was a seductress.  She unapologetically called herself “Cougar.”  Gayle’s tales of trysts with rock stars in the 70’s could have inspired Almost Famous.  Gayle lived large and loved large.  The public Gayle was sensational.  She defied age to change her, and age demurred.

Yet, there was a private side to Gayle; a sweet, maternal aspect that appeared when the makeup and stilettos came off and she was tired, sore and having a cigarette on the porch.  This Gayle deeply missed her mothers, who died in quick succession less than two years ago.  This Gayle was frustrated by COVID and pained by politics.  She found solace in her family and friends.  Gayle was unendingly proud of Seamus, always her deep well of joy.  I’m certain that no wife was ever more supported and adored by her husband than Gayle O’Connor was by Tom O’Connor.  Gayle was Tom’s prize.  He worshipped her, and she him.  After thirty-one years of marriage on their own terms, Tom’s eyes still sparkled when Gayle was the center of attention.  He was so proud to be her husband.  Who wouldn’t envy them?  They were blessed to have each other.  They should have had more time, for themselves, and for all of us who love them.

Tom and Gayle.  Saying that is like saying “red beans and rice.”  Perfect together.  I cannot believe she’s gone, but she lives on in the many happy stories we will tell of her and the light she brought to our lives.

The Irish have a blessing for the departed that seems right for a McCormick O’Connor: “May the road rise to meet you, and the wind always be at your back. May the sun shine warm on your face and the rains fall softly on your fields. And until we meet again, may God hold you gently in the palm of his hand.”

If there’s a rock-and-roll heaven, you’ll find Gayle right up front with the band. She’ll be dancing and all eyes will be on her.

Tom is planning a musical celebration of Gayle’s life for 3:30pm Saturday afternoon, October 24 at Bayou St. John. Here are the details in Tom’s words:

“For anyone in New Orleans, on Sat Oct 24 we’re going to do a New Orleans celebration of Gayle’s life at her place, 3234 Grand Route St.John. (that’s between Moss St and the Esplanade) We’ll walk down to the Bayou (half a block), I’ll say a (very) few words, have a tribute song from her friends Maggie & Kess, scatter some ashes, then go back to her house for some music on the porch from 4 to 6 by Glenn David Andrews, one of Gayle’s favorite New Orleans musicians.”

“We’re still in COVID-19 restrictions folks so wear masks and socially distance in order to allow the celebration to go off without a hitch. If you bring flowers, please bring yellow flowers …it was her favorite color.”

“Donations can be made in her name to the Grace House residential treatment center for women in New Orleans at https://www.bridgehouse.org/support-us/donate/“

A sweet remembrance from Bob Ambrogi.

The Case for Native, I Swear

Regular readers may tire of my extolling the virtues of native forms of production; but battleships turn slowly, and this one must yet be turned. Apart from judges (whose frequent unfamiliarity with electronic evidence makes them easy prey for prevarication), those best situated to end the ruin of TIFF+ productions are those who profit most from doing nothing.

Articles, speeches and blog posts can only go so far. What’s needed are published judicial decisions. Whether they go one way or the other, we need thoughtful opinions that lay out the issues in an accurate and balanced way, informing litigants what’s at stake. Many published orders fail to weigh the genuine pros and cons of each form of production. A few read as if TIFF images were the evidence and requesting parties were seeking to have God-given TIFF images converted into heretical native files. Talk about confused!

Seeking another published opinion on the merits of native production, I recently supplied a declaration to a federal court. I’m attaching an anonymized version of my testimony in the hope that readers will weigh the arguments. I concede “it ain’t Shakespeare,” but it’s honest. I changed a lot to make it difficult to identify the matter, although the Declaration is a matter of public record. Sorry, but I thought a little less candor would be the wiser path. The lawsuit is still very much in contention.

Here’s the anonymized version in PDF:

Here’s the guts of it:

Continue reading

The Perfect Preservation Letter: A New Guide

Yesterday, I asked my Electronic Evidence class at Tulane Law School, “What’s the difference between a preservation letter and a legal hold notice?”

Do you know?

I got the simple answer I sought: You put your clients on notice of legal hold; you send a preservation letter to the other side.  Another difference is that there is no legal duty to dispatch a preservation letter, but woe betide the lawyer who fails to initiate a prompt and proper litigation hold! 

In truth, the two missives have much in common.  Both seek the preservation of evidence, and both are best when clear, specific and instructive.  Both must go out when you know less than you’d like about sources of potentially responsive information.  Finally, both tend to receive minimal thought before dissemination, resulting in easily ignored, boilerplate forms crowding out artfully-targeted requests.

If I’m frank, most of what passed for preservation letters “back in the day” were, well, crap.  They sprang from forensic service providers and sounded more like ransom notes than statements of a practical and proportionate legal duty.  Literal compliance required pulling the plugs on the computers and backing away…very…very…slowly.  But, with the first 2006 amendments to the Federal Rules of Civil Procedure came a groundswell to routinize e-discovery, to label its stages (as in the iconic EDRM diagram) and to systemize its execution by development of “defensible, repeatable processes.”  So, way back when, I wrote an article introducing requesting parties to the “perfect” preservation letter and offering an example as a drafting aid.  Perhaps because it was the only lifeboat in a storm, it took off; and it wasn’t long before lawyers on the north side of the docket made it their favorite opening salvo. 

If that sounds like bragging, know that I’m not proud of what happened.  People started using the exemplar “perfect” letter in the lazy way I hoped they wouldn’t:  as a form pitched at cases of every stripe and type. 

Hey folks. “Perfect” was tongue-in-cheek!  I wrote,

You won’t find the perfect preservation letter in any formbook. You must custom craft it from a judicious mix of clear, technically astute terminology and fact-specific direction. It compels broad retention while asking for no more than the essentials. It rings with reasonableness. Its demands are proportionate to the needs of the case, and it keeps the focus of e-discovery where it belongs: on relevance.

But no one read that.  It was just too easy to hand the example over to an assistant and say, “send this out in all our cases.”

Fast forward to 2018 and counsel to the President of the United States sends out my letter without updating it to reflect any of the changes we’ve seen in sources and forms of electronically stored information since, say, Hurricane Katrina.  Imagine a preservation letter from President Trump that ignores tweets, for goodness sake!  Clearly, the article and the accompanying exemplar letter both needed more than a fresh coat of paint.  Weirdly, the gap hadn’t been filled by anything else in fifteen years.

A few weeks back, I updated and published the exemplar letter, with a fresh plea to use it as a drafting aid and not as a form.  Today, I finished updating the guide to its use, once again called (IRONICALLY) The Perfect Preservation Letter.  It’s still no masterpiece. To be useful, the letter must be a living document, changing to reflect new sources (Dating sites! I forgot to add dating sites!) and improved ways to preserve and acquire evidence. I hope a new generation of lawyers finds it instructive.  There’s plenty of room for improvement, so dig in, make it better, make it your own.

Advanced Zoom “Weather Map” Technique

I lately presented a program for the State Bar of Texas Annual Meeting alongside Texas District Court Judge Emily Miskel. Like everything else, the venerable Annual Meeting was recast as a virtual event. Our topic was “Upping your Game in Zoom,” and we spoke of many ways to improve the quality of online video meetings and hearings. Judge Miskel and I covered dead simple ways to avoid common errors and some advanced techniques. One advanced approach I shared was making your presentation visuals serve as your dynamic Zoom background, enabling a presenter to interact with background visuals in the same way that TV meteorologists explain weather patterns using a green screen map.

There are times when a disembodied narration of screen-filling visuals is best; yet, there are times when you don’t want to force viewers to choose between speakers and visuals, as occurs when Zoom attendees lack the screen real estate or mastery of the Zoom interface needed to pin speakers to larger windows. Let’s face it: most Zoom users are overwhelmed by mute/unmute; asking them to pin and resize screens is a bridge too far.

Certainly, anyone can share a PowerPoint presentation in Zoom, bringing slide imagery to the fore and relegating speakers to tiny squares at the perimeter, like the world’s saddest episode of The Brady Bunch. Instead, I wanted to be a more prominent part of the show, akin to the accustomed ways speakers present onstage.

Television news anchors routinely uses “OTS” (for over-the-shoulder) graphics as an effective segue between the newsreader and story video. OTS graphics work nicely in Zoom, introducing the topic or bullet points in a background slide, then sharing out the focal graphics. It sounds complicated, but it’s easy to get the hang of going to and returning from shared screens. It takes practice, but isn’t practice always key to improving presentation skill?

PowerPoint does all the heavy lifting of converting your slide visuals to still images (and even to video) suitable for use as Zoom backgrounds. Any PowerPoint slide show can be saved as individual JPG or PNG graphics. The “trick” is to compose the slide to afford room for the presenter’s upper torso without obstructing the visuals.

If you look at the two images below, you can see that I’ve left vacant the lower right quadrant of each slide. This presentation required use of templates, but left to my own aesthetics, I never use templates.

I hate ugly templates!
Reserve part of the screen for your image. Don’t block your bullets!

In practice, I adjust my camera such that my head and shoulders occupy the lower right of the Zoom screen (see below), then I can point at bullets and gesture at graphics. The weathercaster technique really shines when you present standing up. Then, you’d devote one-half to one-third of the slide layout to your graphics and the balance to you. You could even stand between two columns of bullets, Of course, this requires sufficient room between camera and green screen and, ideally, a dedicated camera and studio lighting.

Would it hurt to smile?

By now, you’ve gathered that achieving a true chroma key effect requires a physical green screen backdrop, not the virtual “where’d my ears go?” background effect often seen. A suitable 9-10′ muslin green screen backdrop will cost about twenty dollars on Amazon. I elected to spend more and get the green screen, crossbar, pair of backdrop supports and a bevy of studio lights and stands for $150.00. If you’ve got a way to hang a green sheet behind you (e.g., curtain rod, tacked to a wall, hung from the ceiling), that twenty dollar backdrop works just fine.

Home Studio Kit

Having created your background visuals and saved each slide as a still JPG or PNG image, you’ll load them into Zoom as Virtual Backgrounds. To do so, start Zoom and go to Virtual Background in the Settings menu. Locate and click the the small plus sign (+) (Arrow 1, below), then click on “Add Image” from the menu and navigate to where you’ve saved your background images. Add each image in this manner, keeping them in the order in which you want them displayed when presenting. Next, click the box to tell Zoom you have a green screen (Arrow 2), and finally, be sure the color shown matches your backdrop. Zoom should do this automatically, but you can also set it manually (Arrow 3).

Zoom’s Settings>Virtual Background screen

You’re ready to go, but before starting a presentation, launch Zoom and Virtual Background again. Practice selecting each background much as you might advance them as slides in a PowerPoint show, choosing them in succession while presenting. If you’ve loaded them in your preferred order, they will appear as options in that order. You will need to keep the Virtual Background settings panel open at all times during your presentation, so a second screen helps insure the settings panel doesn’t disappear behind another window. You don’t want to be fumbling around in search of the Virtual Backgrounds panel while speaking.

The Weather Map Technique is harder to describe than it is to pull off. The key to keeping it smooth and simple calls to mind the out-of-towner visiting Manhattan who asked a local, “How do I get to Carnegie Hall?”

The answer’s the same: “Practice, practice, practice!”

Wish List: I look forward to a day when Zoom natively supports dynamic backgrounds allowing us to feed PowerPoints directly to a background instead of a shared screen. Also, I’d like to be able to folder backgrounds topically. Affording hosts greater control over the layout of Zoom windows would be nice. In Zoom hearings, think how it would help to be able to group lawyers according to their role in the litigation.

It’s About Time!

“Time heals all wounds.”  “Time is money.” “Time flies.” 

To these memorable mots, I add one more: “Time is truth.”

A defining feature of electronic evidence is its connection to temporal metadata or timestamps.  Electronically stored information is frequently described by time metadata denoting when ESI was created, modified, accessed, transmitted, or received.  Clues to time are clues to truth because temporal metadata helps establish and refute authenticity, accuracy, and relevancy.

But in the realms of electronic evidence and digital forensics, time is tricky.  It hides in peculiar places, takes freakish forms, and doesn’t always mean what we imagine.  Because time is truth, it’s valuable to know where to find temporal clues and how to interpret them correctly.

Everyone who works with electronic evidence understands that files stored in a Windows (NTFS) environment are paired with so-called “MAC times,” which have nothing to do with Apple Mac computers or even the MAC address identifying a machine on a network.  In the context of time, MAC is an initialization for Modified, Accessed and Created times.

That doesn’t sound tricky.  Modified means changed, accessed means opened and created means authored, right?  Wrong.  A file’s modified time can change due to actions neither discernible to a user nor reflective of user-contributed edits.  Accessed times change from events (like a virus scan) that most wouldn’t regard as accesses. Moreover, Windows stopped reliably updating file access times way back in 2007 when it introduced the Windows Vista operating system.  Created may coincide with the date a file is authored, but it’s as likely to flow from the copying of the file to new locations and storage media (“created” meaning created in that location). Copying a file in Windows produces an object that appears to have been created after it’s been modified!

it’s crucial to protect the integrity of metadata in e-discovery, so changing file creation times by copying is a big no-no.  Accordingly, e-discovery collection and processing tools perform the nifty trick of changing MAC times on copies to match times on the files copied.  Thus, targeted collection alters every file collected, but done correctly, original metadata values are restored and hash values don’t change.  Remember: system metadata values aren’t stored within the file they describe so system metadata values aren’t included in the calculation of a file’s hash value.  The upshot is that changing a file’s system metadata values—including its filename and MAC times—doesn’t affect the file’s hash value. 

Conversely and ironically, opening a Microsoft Word document without making a change to the file’s contents can change the file’s hash value when the application updates internal metadata like the editing clock.  Yes, there’s even a timekeeping feature in Office applications!

Other tricky aspects of MAC times arise from the fact that time means nothing without place.  When we raise our glasses with the justification, “It’s five o’clock somewhere,” we are acknowledging that time is a ground truth. “Time” means time in a time zone, adjusted for daylight savings and expressed as a UTC Offset stating the number of time zones ahead of or behind GMT, time at the Royal Observatory in Greenwich, England atop the Prime or “zero” Meridian.

Time values of computer files are typically stored in UTC, for Coordinated Universal Time, essentially Greenwich Mean Time (GMT) and sometimes called Zulu or “Z” time, military shorthand for zero meridian time.  When stored times are displayed, they are adjusted by the computer’s operating system to conform to the user’s local time zone and daylight savings time rules.  So in e-discovery and computer forensics, it’s essential to know if a time value is a local time value adjusted for the location and settings of the system or if it’s a UTC value.  The latter is preferred in e-discovery because it enables time normalization of data and communications, supporting the ability to order data from different locales and sources across a uniform timeline.

Four months of pandemic isolation have me thinking about time.  Lost time. Wasted time. Pondering where the time goes in lockdown.   Lately, I had to testify about time in a case involving discovery malfeasance and corruption of time values stemming from poor evidence handling.  When time values are absent or untrustworthy, forensic examiners draw on hidden time values—or, more accurately, encoded time values—to construct timelines or reveal forgeries.

Time values are especially important to the reliable ordering of email communications.  Most e-mails are conversational threads, often a mishmash of “live” messages (with their rich complement of header data, encoded attachments and metadata) and embedded text strings of older messages.  If the senders and receivers occupy different time zones, the timeline suffers: replies precede messages that prompted them, and embedded text strings make it child’s play to alter times and text.  It’s just one more reason I always seek production of e-mail evidence in native and near-native forms, not as static images.  Mail headers hold data that support authenticity and integrity—data you’ll never see produced in a load file.

Underscoring that last point, I’ll close with a wacky, wonderful example of hidden timestamps: time values embedded in Gmail boundaries.  This’ll blow your mind.

If you know where to look in digital evidence, you’ll find time values hidden like Easter eggs. 

E-mail must adhere to structural conventions to traverse the internet and be understood by different e-mail programs. One of these conventions is the use of a Content-Type declaration and setting of content boundaries, enabling systems to distinguish the message header region from the message body and attachment regions.

The next illustration is a snippet of simplified code from a forged Gmail message.  To see the underlying code of a Gmail message, users can select “Show original” from the message options drop-down menu (i.e., the ‘three dots’).

The line partly outlined in red advises that the message will be “multipart/alternative,” indicating that there will be multiple versions of the content supplied; commonly a plain text version followed by an HTML version. To prevent confusion of the boundary designator with message text, a complex sequence of characters is generated to serve as the content boundary. The boundary is declared to be “00000000000063770305a4a90212” and delineates a transition from the header to the plain text version (shown) to the HTML version that follows (not shown).

Thus, a boundary’s sole raison d’être is to separate parts of an e-mail; but because a boundary must be unique to serve its purpose, programmers insure against collision with message text by integrating time data into the boundary text.  Now, watch how we decode that time data.

Here’s our boundary, and I’ve highlighted fourteen hexadecimal characters in red:

Next, I’ve parsed the highlighted text into six- and eight-character strings, reversed their order and concatenated the strings to create a new hexadecimal number:

A decimal number is Base 10.  A hexadecimal number is Base 16.  They are merely different ways of notating numeric values.  So, 05a4a902637703 is just a really big number. If we convert it to its decimal value, it becomes: 1,588,420,680,054,531.  That’s 1 quadrillion, 588 trillion, 420 billion, 680 million, 54 thousand, 531.  Like I said, a BIG number.

But, a big number…of what?

Here’s where it gets amazing (or borderline insane, depending on your point of view).

It’s the number of microseconds that have elapsed since January 1, 1970 (midnight UTC), not counting leap seconds. A microsecond is a millionth of a second, and 1/1/1970 is the “Epoch Date” for the Unix operating system. An Epoch Date is the date from which a computer measures system time. Some systems resolve the Unix timestamp to seconds (10-digits), milliseconds (13-digits) or microseconds (16-digits).

When you make that curious calculation, the resulting date proves to be Saturday, May 2, 2020 6:58:00.054 AM UTC-05:00 DST.  That’s the genuine date and time the forged message was sent.  It’s not magic; it’s just math.

Had the timestamp been created by the Windows operating system, the number would signify the number of 100 nanosecond intervals between midnight (UTC) on January 1, 1601 and the precise time the message was sent.

Why January 1, 1601?  Because that’s the “Epoch Date” for Microsoft Windows.  Again, an Epoch Date is the date from which a computer measures system time.  Unix and POSIX measure time in seconds from January 1, 1970.  Apple used one second intervals since January 1, 1904, and MS-DOS used seconds since January 1, 1980. Windows went with 1/1/1601 because, when the Windows operating system was being designed, we were in the first 400-year cycle of the Gregorian calendar (implemented in 1582 to replace the Julian calendar). Rounding up to the start of the first full century of the 400-year cycle made the math cleaner.

Timestamps are everywhere in e-mail, hiding in plain sight.  You’ll find them in boundaries, message IDs, DKIM stamps and SMTP IDs.  Each server handoff adds its own timestamp.  It’s the rare e-mail forger who will find every embedded timestamp and correctly modify them all to conceal the forgery. 

When e-mail is produced in its native and near-native forms, there’s more there than meets the eye in terms of the ability to generate reliable timelines and flush out forgeries and excised threads.  Next time the e-mail you receive in discovery seems “off” and your opponent balks at giving you suspicious e-mail evidence in faithful electronic formats, ask yourself: What are they trying to hide?

The takeaway is this: Time is truth and timestamps are evidence in their own right.  Isn’t it about time we stop letting opponents strip it away?

Tip of the hat to Arman Gungor at Metaspike whose two excellent articles about e-mail timestamp forensics reminded me how much I love this stuff.  https://www.metaspike.com/timestamps-forensic-email-examination/

Don’t Bet the Farm on Slack Space

A depiction of file slack from Ball, E-Discovery Workbook © 2020

A federal court appointed me Special Master, tasked to, in part, search the file slack space of a party’s computers and storage devices.  The assignment prompted me to reconsider the value of this once-important forensic artifact.

Slack space is the area between the end of a stored file and the end of its concluding cluster: the difference between a file’s logical and physical size. It’s wasted space from the standpoint of the computer’s file system, but it has forensic significance by virtue of its potential to hold remnants of data previously stored there.  Slack space is often confused with unallocated clusters or  free space, terms describing areas of a drive not currently used for file storage (i.e., not allocated to a file) but which retain previously stored, deleted files. 

A key distinction between unallocated clusters and slack space is that unallocated clusters can hold the complete contents of a deleted file whereas slack space cannot.  Data recovered (“carved”) from unallocated clusters can be quite large—spanning thousands of clusters—where data recovered from a stored file’s slack space can never be larger than one cluster minus one byte.  Crucially, unallocated clusters often retain a deleted file’s binary header signature serving to identify the file type and reveal the proper way to decode the data, whereas binary header signatures in slack space are typically overwritten.

A little more background in file storage may prove useful before I describe the dwindling value of slack space in forensics.

Electronic storage media are physically subdivided into millions, billions or trillions of sectors of fixed storage capacity.  Historically, disk sectors on electromagnetic hard drives were 512 bytes  in size.  Today, sectors may be much larger (e.g., 4,096 bytes).  A sector is the smallest physical storage unit on a disk drive, but not the smallest accessible storage unit.  That distinction belongs to a larger unit called the cluster, a logical grouping of sectors and the smallest storage unit a computer can read from or write to.  On Windows machines, clusters are 4,096 bytes (4kb) by default for drives up to 16 terabytes.  So, when a computer stores or retrieves data, it must do so in four kilobyte clusters.

File storage entails allocation of enough whole clusters to hold a file.  Thus, a 2kb file will only fill half a 4kb cluster–the balance being slack space.  A 13kb file will tie up four clusters, although just a fraction of the final, fourth cluster is occupied is occupied by the file.  The balance is slack space and it could hold fragments of whatever was stored there before.  Because it’s rare for files to be perfectly divisible by 4 kilobytes and many files stored are tiny, much drive space is lost to slack space.  Using smaller clusters would mean less slack space, but any efficiencies gained would come at the cost of unwieldy file tracking and retrieval.

So, slack space holds forensic artifacts and those artifacts tend to hang around a long time.  Unallocated clusters may be called into service at any time and their legacy content overwritten.  But data lodged in slack space endures until the file allocated to the cluster is deleted–on conventional “spinning” hard drives at any rate.

When I started studying computer forensics in the MS-DOS era, slack space loomed large as a source of forensic intelligence.  Yet, apart from training exercises where something was always hidden in slack, I can’t recall a matter I’ve investigated this century which turned on evidence found in slack space.  The potential is there, so when it makes sense to do it, examiners search slack using unique phrases unlikely to throw off countless false positives.

But how often does it make sense to search slack nowadays?

I’ve lately grappled with that question because it seems to me that the shopworn notions respecting slack space must be re-calibrated.  

Keep in mind that slack space holds just a shard of data with its leading bytes overwritten.  It may be overwritten minimally or overwritten extensively, but some part is obliterated, always.  Too, slack space may hold the remnants of multiple deleted files; that is, as overlapping artifacts: files written, deleted overwritten by new data, deleted again, then overwritten again (just less extensively so).  Slack can be a real mess.

Fifteen years ago, when programs stored text in ASCII (i.e., encoded using the American Standard Code for Information Interchange or simply “plain text”), you could find intelligible snippets in slack space.  But since 2007, when Microsoft changed the format of Office productivity files like Word, PowerPoint and Excel files to Zip-compressed XML formats, there’s been a sea change in how Office applications and other programs store text.  Today, if a forensic examiner looks at a Microsoft Office file as it’s written on the media, the content is compressed.  You won’t see any plain text.  The file’s contents resemble encrypted data.  The “PK” binary header signature identifying it as compressed content is gone, so how will you recognize zipped content?  What’s more, the parts of the Zip file required to decompress the snippet have likely been obliterated, too. How do you decode fragments if you don’t know the file type or the encoding schema?

The best answer I have is you throw common encodings against the slack and hope something matches up with the search terms.  More-and-more, nothing matches, even when what you seek really is in the slack space. Searches fail because the data’s encoded and invisible to the search tool.  I don’t know how searching slack stacks up against the odds of winning the lottery, but a lottery ticket is cheap; a forensic examiner’s time isn’t.

That’s just the software.  Storage hardware has evolved, too.  Drives are routinely encrypted, and some oddball encryption methods make it difficult or impossible to explore the contents of file slack.  The ultimate nail in the coffin for slack space will be solid state storage devices and features, like wear leveling and TRIM that routinely reposition data and promise to relegate slack space and unallocated clusters to the digital dung heap of history.

Taking a fresh look at file slack persuades me that it still belongs in a forensic examiner’s bag of tricks when it can be accomplished programmatically and with little associated cost.  But, before an expert characterizes it as essential or a requesting party offers it as primary justification for an independent forensic examination, I’d urge the parties and the Court to weigh cost versus benefit; that is, to undertake a proportionality analysis in the argot of electronic discovery.  Where searching slack space was once a go-to for forensic examination, it’s an also-ran now. Do it, when it’s an incidental feature of a thoughtfully composed examination protocol; but don’t bet the farm on finding the smoking gun because the old gray mare, she ain’t what she used to be!
See? I never metaphor I didn’t like.


Postscript: A question came up elsewhere about solid state drive forensics. Here was my reply:

The paradigm-changing issue with SSD forensic analysis versus conventional magnetic hard drives is the relentless movement of data by wear leveling protocols and a fundamentally different data storage mechanism. Solid state cells have a finite life measured in the number of write-rewrite cycles.

To extend their useful life, solid state drives move data around to insure that all cells are written with roughly equal frequency. This is called “wear leveling,” and it works. A consequence of wear leveling is that unallocated cells are constantly being overwritten, so SSDs do not retain deleted data as electromagnetic drives do. Wear leveling (and the requisite remapping of data) is handled by an SSD drive’s onboard electronics and isn’t something users or the operating system control or access.

Another technology, an ATA command called TRIM, is controllable by the operating system and serves to optimize drive performance by disposing of the contents of storage cell groups called “pages” that are no longer in use. Oversimplified, it’s faster to write to an empty memory page than to initiate an erasure first; so, TRIM speeds the write process by clearing contents before they are needed, in contrast to an electromagnetic hard drive which overwrites clusters without need to clear contents beforehand.

The upshot is that resurrecting deleted files by identifying their binary file signatures and “carving” their remnant contents from unallocated clusters isn’t feasible on SSD media. Don’t confuse this with forensically-sound preservation and collection. You can still image a solid state drive, but you’re not going to get unallocated clusters. Too, you won’t be interfacing with the physical media grabbing a bitstream image. Everything is mediated by the drive electronics.


Dear Reader, Sorry I’ve been remiss in posting here during the COVID crisis. I am healthy, happy and cherishing the peace and quiet of the pause, hunkered down in my circa-1880 double shotgun home in New Orleans, enjoying my own cooking far too much. Thanks to Zoom, I completed my Spring Digital Evidence class at the University of Texas School of Law, so now one day just bubbles into the next, and I’m left wondering, Where did the day go?. Every event where I was scheduled to speak or teach cratered, with no face-to-face events sensibly in sight for 2020. One possible exception: I’ve just joined the faculty of the Tulane School of Law ten minutes upriver for the Fall semester, and plan to be back in Austin teaching in the Spring. But, who knows, right? Man plans and gods laugh.

We of a certain age may all be Zooming and distancing for many months. As one who’s bounced around the world peripatetically for decades, not being constantly on airplanes and in hotels is strange…and stress-relieving. While I miss family, friends and colleagues and mourn the suffering others are enduring, I’ve benefited from the reboot, ticking off household projects and kicking the tires on a less-driven day-to-day. It hasn’t hurt that it’s been the best two months of good weather I’ve ever seen, here or anywhere. The prospect of no world travel this summer–and no break from the soon-to-be balmy Big Easy heat–is disheartening, but small potatoes in the larger scheme of things.

Be well, be safe, be kind to yourself. This, too, shall pass and as my personal theme song says, There's a Great Big Beautiful Tomorrow. Just a Dream Away.

Protect your Meetings From Zoom Bombers

Distanced by Coronavirus, lawyers and teachers are flocking to the teleconferencing platform Zoom to meet and share screens.  Zoom is also turning up as a way to emulate face-to-face social interactions ranging from AA meetings and book clubs to happy hours and rock concerts.  Last week, the Chipotle fast food chain sought to bring a little joy to COVID-stressed customers by hosting an online concert with singer/songwriter Lauv. Things didn’t go as planned, and there’s a lesson there for lawyers and others needing meeting security.

Per Tressie Lieberman, Chipotle’s VP of digital and off-premise, “As we saw large scale events begin to get cancelled, we wanted to act fast and give our fans something to get excited about despite being surrounded by negative news.”  Chipotle acted fast–too fast it seems–and assuredly gave viewers something to get excited about, though not as intended.  Chipotle was forced to pull the plug after one attendee used Zoom’s Screen Share feature to broadcast pornography to hundreds of other attendees.  ‘Zoombombing’: When Video Conferences Go Wrong New York Times, March 22, 2020

Whoever configured the Zoom meeting apparently failed to select the option that limits the ability of any meeting participant other than the host to share screens.  As a result, any attendee—including any troll logging in anonymously—could share any content they like with all other attendees.  It’s called Zoom bombing (like Photobombing) and it’s a growing disruption.  If a Zoom bomber logs in multiple times, stopping the interloper is like playing Whack-a-Mole.  The host shuts down one Zoom bombing instance only to push the Zoom bomber to the next and the next.

It’s an embarrassment that could have been avoided had the individual setting up the Zoom meeting changed a Screen Sharing option buried in the program’s settings menu, eschewing the default “All Participants” in favor of the the considerably safer “Host Only” as seen below.

This unfortunate intrusion was caused by user error, not a vulnerability in the tool.  But I’d been expecting something of a similar nature to occur since I noticed that Zoom issues every subscriber a personal Zoom meeting ID as an alternative to generating a one-time use meeting ID for every meeting. That’s a vulnerability. What it means is, if anyone learns the host’s personal Zoom meeting ID (hint: it’s the meeting number contained in the meeting invitation), anyone can attend the host’s personal meetings whether invited or not.  Of course, if the host is managing participants and keeping a close eye on headcounts, an uninvited lurker may be spotted.  If it were a meeting of many counsel in multidistrict litigation or other matters characterized by large teams, it would be easy for an opponent to log in and listen undetected. 

Here are other simple tips to secure your Zoom meetings against Zoom bombers and eavesdroppers:

1. Protect your personal Zoom meeting ID as you would your personal passwords. Never use your personal Zoom meeting ID to host a meeting.   Instead, have Zoom automatically generate a unique meeting ID for your invitations.

2. Require a meeting password.  Zoom will generate one for your invitees when you check the box.

3. Allow only authenticated users to join.  To gain entry, invited users will need to have a Zoom user account (they’re free) and log into Zoom.

4. Require participants attend with video cameras turned on, at least until the host can identify all the participants in the meeting and confirm they were invited.

5. Lock the meeting after all invited attendees have joined and prevent latecomers. To lock an ongoing meeting, click “Manage Participants,” then click “More” at the bottom of the Participants screen.  Finally, choose” Lock Meeting.”