Understanding the UPC: Because You Can

Where does the average person encounter binary data?  Though we daily confront a deluge of digital information, it’s all slickly packaged to spare us the bare binary bones of modern information technology.  All, that is, save the humble Universal Product Code, the bar code symbology on every packaged product we purchase from a 70-inch TV to a box of Pop Tarts.  Bar codes and their smarter Japanese cousins, QR Codes, are perhaps the most unvarnished example of binary encoding in our lives. 

Barcodes have an ancient tie to e-discovery as they were once used to Bates label hard copy documents, linking them to “objective coding” databases. A lawyer using barcoded documents was pretty hot stuff back in the day.

Just a dozen numeric characters are encoded by the ninety-five stripes of a UPC-A barcode, but those digits are encoded so ingeniously as to make them error resistant and virtually tamperproof. The black and white stripes of a UPC are the ones and zeroes of binary encoding.  Each number is encoded as seven bars and spaces (12×7=84 bars and spaces) and an additional eleven bars and spaces denote start, middle and end of the UPC.  The start and end markers are each encoded as bar-space-bar and the middle is always space-bar-space-bar-space.  Numbers in a bar code are encoded by the width of the bar or space, from one to four units. 

This image has an empty alt attribute; its file name is barcode-water.png

The bottle of Great Value purified water beside me sports the bar code at right.

Humans can read the numbers along the bottom, but the checkout scanner cannot; the scanner reads the bars. Before we delve into what the numbers signify in the transaction, let’s probe how the barcode embodies the numbers.  Here, I describe a bar code format called UPC-A.  It’s a one-dimensional code because it’s read across.  Other bar codes (e.g., QR codes) are two-dimensional codes and store more information because they use a matrix that’s read side-to-side and top-to-bottom.

The first two black bars on each end of the barcode signal the start and end of the sequence (bar-space-bar).  They also serve to establish the baseline width of a single bar to serve as a touchstone for measurement.  Bar codes must be scalable for different packaging, so the ability to change the size of the codes hinges on the ability to establish the scale of a single bar before reading the code.

Each of the ten decimal digits of the UPC are encoded using seven “bar width” units per the schema in the table at right.

To convey the decimal string 078742, the encoded sequence is 3211 1312 1213 1312 1132 2122 where each number in the encoding is the width of the bars or spaces.  So, for the leading value “zero,” the number is encoded as seven consecutive units divided into bars of varying widths: a bar three units wide, then (denoted by the change in color from white to black or vice-versa), a bar two units wide, then one then one.  Do you see it? Once more, left-to-right, a white band, three units wide, a dark band two units wide , then a single white band and a single dark band (3-2-1-1 encoding the decimal value zero).

You could recast the encoding in ones and zeroes, where a black bar is a one and a white bar a zero. If you did, the first digit would be 0001101, the number seven would be 0111011 and so on; but there’s no need for that, because the bands of light and dark are far easier to read with a beam of light than a string of printed characters.

Taking a closer look at the first six digits of my water bottle’s UPC, I’ve superimposed the widths and corresponding decimal value for each group of seven units. The top is my idealized representation of the encoding and the bottom is taken from a photograph of the label:

Now that you know how the bars encode the numbers, let’s turn to what the twelve digits mean.  The first six digits generally denote the product manufacturer. 078742 is Walmart. 038000 is assigned to Kellogg’s.  Apple is 885909 and Starbucks is 099555.  The first digit can define the operation of the code.  For example, when the first digit is a 5, it signifies a coupon and ties the coupon to the purchase required for its use.  If the first digit is a 2, then the item is something sold by weight, like meats, fruit or vegetables, and the last six digits reflect the weight or price per pound.  If the first digit is a 3, the item is a pharmaceutical.

Following the leftmost six-digit manufacturer code is the middle marker (1111, as space-bar-space-bar-space) followed by five digits identifying the product.  Every size, color and combo demands a unique identifier to obtain accurate pricing and an up-to-date inventory.

The last digit in the UPC serves as an error-correcting check digit to ensure the code has been read correctly.  The check digit derives from a calculation performed on the other digits, such that if any digit is altered the check digit won’t match the changed sequence. Forget about altering a UPC with a black marker: the change wouldn’t work out to the same check digit, so the scanner will reject it.

In case you’re wondering, the first product to be scanned at a checkout counter using a bar code was a fifty stick pack of Juicy Fruit gum in Troy, Ohio on June 26, 1974.  It rang up for sixty-seven cents.  Today, 45 sticks will set you back $2.48 (UPC 22000109989).

Steganography: Because Who Doesn’t Love Bacon?

I’m updating my E-Discovery Workbook to begin a new semester at the University of Texas School of Law next week, and I can’t help working in historical tidbits celebrating the antecedents of modern information technology.  The following is new material for my discussion of digital encoding: a topic I regard as essential to a good grasp of digital forensics and electronic evidence.  When you understand encoding, you understand why varying sources of electronically stored information are more alike than different and why forms of production matter.

We record information every day using 26 symbols called “the alphabet,” abetted by helpful signals called “punctuation.”  So, you could say that we write in hexavigesimal (Base26) encoding.

“Binary” or Base2 encoding is notating information using nothing but two symbols: conventionally, the numbers one and zero.  It’s often said that “computer data is stored as ones and zeroes;” but that’s a fiction.  In fact, binary data is stored physically, electronically, magnetically or optically using mechanisms that permit the detection of two clearly distinguishable “states,” whether manifested as faint voltage potentials (e.g., thumb drives), polar magnetic reversals (e.g., spinning hard drives) or pits on a reflective disc deflecting a laser beam (e.g., DVDs).  Ones and zeroes are simply a useful way to notate those states. You could use any two symbols as binary characters, or even two discrete characteristics of the “same” symbol. For now, just ponder how you might record or communicate two “different” characteristics, as by two different shapes, colors, sizes, orientations, markings, etc.

I free you from the trope of ones and zeroes to plumb the evolution of binary communication and explore an obscure coding cul-de-sac called Steganography, from the Greek, meaning “concealed writing.”  But first, we need an aside of Bacon.

I mean, of course, lawyer and statesman Sir Francis Bacon (1561-1626).  Among his many accomplishments, Bacon conceived a bilateral cipher (a “code” in modern parlance) enabling the hiding of messages omnia per omnia, or “anything by anything.”

Bacon’s cipher used the letters “A” and “B” to denote binary values; but if we use ones and zeros instead, we see the straight line from Bacon’s clever cipher to modern ASCII and Unicode encoding.

As with modern computer encoding, we need multiple binary digits (“bits”) to encode or “stand in for” the letters of the alphabet.  Bacon chose the five-bit sets at right:

If we substitute ones and zeroes (right), Bacon’s cipher starts to look uncannily like contemporary binary encodings.

Why five bits and not three or four? The answer lies in binary math (“Oh no! Not MATH!!”). Wait, wait; it won’t hurt. I promise!

If you have one binary digit (21), you have only two unique states (one or zero), so you can only encode two letters, say A and B.  If you have two binary digits (22 or 2×2), you can encode four letters, say A, B, C and D.  With three binary digits (23 or 2x2x2), you can encode eight letters.  Finally, with four binary digits (24 or 2x2x2x2), you can encode just sixteen letters.  So, do you see the problem in trying to encode the letters of a 26-letter alphabet?  You must use at least five binary digits (25 or 32) unless you are content to forgo ten letters.

Sir Francis Bacon wasn’t especially interested in encoding text as bits. His goal was to hide messages in any medium, permitting a clued-in reader to distinguish between differences lurking in plain sight.  Those differences—whatever they might be—serve to denote the “A” or “B” in Bacon’s steganographic technique. For example:

That last one is quite subtle, right?  Here’s how it’s done:

To conceal my name in each of the respective examples, every unbolded/unitalicized/serif character signifies an “A” in Bacon’s cipher and every boldface/italicized/sans serif character signifies a “B” (ignore the spaces and punctuation).  The bold and italic approaches look wonky and could arouse suspicion, but if the fonts are chosen carefully, the absence of serifs should go unnoticed.  Take a closer look to see how it works:

In my examples, I’ve used Bacon’s cipher to hide text within text, but it can as easily hide messages in almost anything.  My favorite example is the class photo of World War I cryptographers trained in Aurora, Illinois by famed cryptographers, William and Elizabeth Friedman.[1]  Before they headed for France, the newly minted codebreakers lined up for the cameraman; but there’s more going on here than meets the eye.

Taking to heart omnia per omnia, the Friedmans ingenuously encoded Sir Francis Bacon’s maxim “knowledge is power” within the photograph using Bacon’s cipher.  The 71 soldiers and their instructors convey the cipher text by facing or looking away from the camera.  Those facing denote an “A.”  Those looking away denote a “B.”  There weren’t quite enough present to encode the entire maxim, so the decoded message actually reads, “KNOWLEDGE IS POWE.”  Here’s the decoding:

A closer look:

Isn’t that mind blowing?!?!

Steganography is something most computer forensic examiners study but rarely use in practice.  Still, it’s a fascinating discipline with a history reaching back to ancient Greece, where masters tattooed secret messages on servants’ shaved scalps and hit “Send” once the hair grew back.  Digital technology brought new and difficult-to-decipher steganographic techniques enabling images, sound and messages to hitch a hidden ride on a wide range of electronic media.


[1] For this material, I’m indebted to “How to Make Anything Signify Anything” by William H. Sherman in Cabinet no. 40 (Winter 2010-2011).

What’s in a Name (or Hash Value)?

A question common to investigation of alleged data theft is, “Are any of our stolen files on our competitor’s systems?”  Forensic examiners track purloined IP using several strategies: among them, searching for matching filenames, hash values, metadata and content.  Any of these can be altered by data thieves seeking to cover their tracks, but most are too confident or too dim to bother.

A current matter underscored the pitfalls of filename and hash searches, prompting me to reflect on a long-ago case where hash searches caused headaches.  The old case stemmed from a settlement of a data theft event requiring a periodic audit of hashes of the defendant’s data to ensure that stolen data hasn’t re-emerged.  The plaintiff sought sanctions because its expert found hash values in the audit that matched hashes tied to stolen PowerPoint presentations.  The defendants were dumbfounded, certain they’d adhered to the settlement and not used any purloined PowerPoints. 

When I stepped in, I confirmed there were matching hash values, but none matched the PowerPoint PPT and PPTX files of interest.  Instead, the hashes matched only benign component image data within the presentations.  The components hashed were standard slide backgrounds (e.g., “woodgrain”) found in any copy of PowerPoint.  Both parties possessed PowerPoints using some of the same generic design elements, but none were the same presentations.  The hashing tool so thoroughly explored the files that embedded images were hashed separately from the files in which they were used and matched other generic elements in other presentations.  No threat at all!

Still other matching files turned out to be articles freely distributed at an industry trade show and zero-byte “null” files that would match any similarly empty files on any machine.  When every hash match was scrutinized, none proved to be stolen data.  Away went the sanctions motion.

The moral of the story is, although it’s extremely unlikely that two different files will share the same hash value, matching hash values don’t always signify the “same” file in practical terms.  Matching files may derive from independent sources, could be benign components of compilations or might match because they hold little or no content.  The math is powerful, but it mustn’t displace common sense.

In the ongoing matter, a simple method used to identify contraband data was filename matching.  The requesting party sought to identify instances of a file called “Book3.xlsx;” and the search turned up hundreds of instances of identically named files in the producing party’s data–though not a single one hash-matched the file of interest.

Why so many false positives?  It turns out Microsoft Excel assigns an incremented name to any new spreadsheet (despite earlier-opened sheets having been closed) so long as even one prior sheet remains open.  So, if you’ve created eight Excel spreadsheets, renamed them and closed all but one, the next new sheet will be named Book9.xlsx by default.  The name “Book3.xlsx” signified only that two prior spreadsheets had been opened.  The takeaway is that, in any large collection, expect to turn up instances of various Book(n).xlsx files created when a user exited and saved a sheet without renaming it from its default name.

Electronic search—by hash, filename, metadata or keyword–is an invaluable tool in investigation and e-discovery; but one best used with a modicum of common sense by those who appreciate its limitations.

C’mon! Bates Numbering Native Production is Easy!

Sometimes, the other side balks at a proposed e-discovery protocol, arguing it’s unduly burdensome to rename native files to their Bates numbers. I find that odd because parties have always named files for Bates numbers whilst doing clunky TIFF productions.  Where did they think the names of all those TIFF images came from?  The truth is, litigants have been naming files to match Bates numbers for as long as we’ve done e-discovery!  It’s easy!

It’s one thing to say something is easy and another to prove its simplicity.  Certainly, if you use an e-discovery vendor, it’s as easy as saying, “Bates number the native files.”  They know what to do. But anyone doing electronic production in-house can add Bates numbers to filenames simply, quickly and cheaply. 

There are various ways to do it.  You can prepend Bates number (Bates##_filename.ext), append Bates number (filename_Bates##.ext) or replace the filename with the Bates number, storing the original name in a load file.  You can even add protective language like “PRODUCED SUBJECT TO PROTECTIVE ORDER.”

Multiple free and low-cost bulk renaming tools are available.  I’ve long praised a powerful, flexible too called Bulk Renaming Utility. It’s free for personal use and $93 for commercial purposes; a powerful tool, but overwhelming to some.  Seeking a simpler tool and one free to use commercially, I found two: File Renamer Basic and Ant Renamer.  Both impressed me with their flexibility and ease of use. 

For Mac users, there’s a nice free tool called File Renamer for MacOS 64 bit, which I’ll also touch on below.

Let’s look at how to configure both Windows tools to Bates number a production.

Suppose the production protocol reads:

Bates Numbers. All Bates numbers will consist of a three-digit Alpha Prefix, followed immediately by an 8-digit numeric: AAA########. There must be no spaces in the Bates number. Any numbers with less than 8 digits will be front padded with zeros to reach the required 8 digits. ESI will be Bates numbered by substituting, prepending or appending the Bates number for/to the file name.

Assuming there have been ten other items produced earlier,, we must begin Bates numbering at DEF00000011.  For this tutorial, I’ll use just six photos of American coins, but it could as easily be thousands of files of any sort.  Here are thumbnails of the exemplar photos:

The table below lists the filenames and MD5 hash values of the files, allowing us to confirm that a renaming tool won’t otherwise alter the evidence.

Original NameTypeSizeMD5 Hash
dime.jpgjpg747049EB20D0367DEF7F43D0B2EBDFCF8D881
dollar.jpgjpg791920A914C48360374CC434047B3CB5DDEEC
half_dollar.pngpng104763F12B793582DF4863C55A3299769F0885
nickel.jpgjpg4758922F1BDE59CE36229B6ED782F1E1CFB92
penny.jpgjpg10178681199ED5A45E56C8D23BFB24394C27B4
quarter.jpgjpg178375D6B866EFEB8F75D7C3395914F5AAB007

To demonstrate, I placed working copies of all the files needing Bates numbers in a Desktop folder named Production photos 11-21-20.  Inside this folder, I made an empty subfolder called BATES NUMBERED PHOTOS.  You don’t have follow suit, but however you approach it, don’t work on the source evidence; instead, create and produce renamed working copies.

File Renamer Basic

After installing and kicking off the program, I set the following parameters:

  1. Configure the “Folder” and “Copy to” paths.
  2. Set the three-digit Alpha Prefix required by the Protocol (I used “DEF” for Defendants).
  3. Set Unique Parameter to “Numbers,” “Increment” by 1, mask with eight zeroes and “Start at 11” (the next unassigned Bates number).
  4. Set Separator to a single underscore.  [While the protocol neither requires nor prohibits adding a separator between the Bates number and filename, I like to add it for clarity]
  5. In the Filename settings box, check “Place Unique Parameter before Filename.”
  6. Click “Preview,” and if you’re happy with the preview, click “Apply.”

Running hash values against the renamed files, we see that renaming the files has not altered their hash values.

NameTypeSizeMD5 Hash
DEF00000011_dime.jpgjpg747049EB20D0367DEF7F43D0B2EBDFCF8D881
DEF00000012_dollar.jpgjpg791920A914C48360374CC434047B3CB5DDEEC
DEF00000013_half_dollar.pngpng104763F12B793582DF4863C55A3299769F0885
DEF00000014_nickel.jpgjpg4758922F1BDE59CE36229B6ED782F1E1CFB92
DEF00000015_penny.jpgjpg10178681199ED5A45E56C8D23BFB24394C27B4
DEF00000016_quarter.jpgjpg178375D6B866EFEB8F75D7C3395914F5AAB007

Ant Renamer

After installing and kicking off the program, I set the following parameters:

  1. Using “Add Folders,” navigate to and select the folder with the files to be renamed.
  2. Click F10 to launch the Options menu and, under the >Processing tab, check the box “Copy instead of Rename,” then click “OK.”
  3. Under “Actions,” select “Enumeration” and configure the mask as: DEF%num%_%name%%ext%
  4. Set “Start at:” to 11 and “Number of Digits” to 8.
  5. Click “Preview of Selected Files” and, if all seems well, click GO on the menu.

Note that these settings will create a Bates numbered set of duplicate files in the same folder as the source files, NOT in the subfolder.

Frankly, it’s harder to describe the task than to complete it. After a few minutes playing with the settings, you’ll easily figure out how to prepend a Bates number, append it or swap it for the original name. Once you’ve gotten the settings where you’d like them, File Renamer Basic allows you to save your custom settings as a profile and apply it to future productions.

I spent only a short time investigating The Mac application FileRenamer, but it was intuitive enough to use without any unmanly reading of directions and took just seconds to configure numbering and set a mask to finish the task. I configured numbering in Settings>Numbering (Initial value: 11, Increment: 1 and Fixed Length with Leading Zeroes: 8) then the mask to include the three-digit alpha prefix, padded numbering and underscore separator to precede the filename (DEF%num%_%name%).

Easy as pie! And while we’re on the subject of pie, HAPPY THANKSGIVING!

The Metadata Vanishes

I love solving puzzles.  I come by it honestly. My late mother was a nationally ranked New York Times crossword puzzler, and though I lack her prodigious gifts, I start each morning racing on the Times crossword.  I mention puzzling to note that the best part of my forensics work is finding the answer to electronic evidence puzzles.  This week’s challenge comes from a legal assistant caught between a rock and a hard place, actually between the plaintiff and defense counsel.  The defense objected that photos produced in discovery lacked metadata, while the plaintiff insisted the photos he had furnished contained the “missing” metadata.  How could they both be right?  The mystified legal assistant had simply saved the photos from the transmitting message and sent them on to the other side.  She hadn’t removed any metadata.  Or had she?

I had to figure out what happened and keep it from happening again.

First, some technical underpinnings: 

What do we mean by metadata?  Digital photos, particularly those taken with cell phone cameras, hold more information than shows up in the pretty pictures.  Stored within the photos is a type of application metadata called EXIF (for Exchangeable Image File Format). EXIF holds camera settings, including the make and model of the camera or phone, time and date information, geolocation coordinates and more.  Because it’s application metadata, it’s content stored within the file and moves with the file when copied or transmitted…unless someone or something makes it disappear.

There’s a second sort of metadata called system metadata, It’s context; data about the file that’s stored without the file, typically in the system’s file table that serves as a directory of electronically stored information.  System metadata includes such things as a file’s name, location, modified and created dates and more.  Because it’s stored outside a file, it doesn’t move with the file but must be rounded up when a file is copied or transmitted.  Precious little system metadata follows a file when it’s e-mailed, often just the file’s name, size and type (although Apple systems include the file’s last modified and created dates).

The defense was seeing dates and times for photos that did not line up with the actual dates and times the photos were taken.  Too, the camera and geolocation data that should have been in the EXIF segments of the pictures were gone when plaintiffs produced them.

Picture formats and EXIF metadata: The photos produced were taken with an iPhone and stored on a Mac computer.  When most of us think of digital photos, we probably think of JPEG images stored as files with the extension .JPG.  The JPEG photo format has been around for almost thirty years and been the most common format for much of that time.  JPEG is what’s termed “lossy compression” referring to its ability to make image files smaller in size by jettisoning parts of the image that contribute to resolution and detail.  The more tightly you compress a JPEG image (and the more often you do it), the “jaggier” and more distorted the image becomes.

As digital cameras have improved, digital photographs have grown larger in size, eating up storage space.  Two-thirds of the data on my iPhone are photographs.  Seeking a more efficient way to store images and video, Apple started phasing out JPEG images in 2017.  The replacement was a format called High Efficiency Image File Format which, as implemented by Apple, photos are stored as High-Efficiency Image Containers with the file extension .HEIC. 

The benefit is that, for comparable image quality, HEIC images are roughly half the size of JPEG images, and they hold EXIF data.  The downside is that most of the world still expects a picture to be a JPEG and the Windows and Cloud realms need time to catch up.  To remain compatible with other devices and operating systems, Apple converts HEIC images to JPEGs for sharing via e-mail.

Now, there’s something to consider!  Did Apple strip out the EXIF metadata from the HEIC photos when it converted them to JPEGs? Hold that thought while I lay a little more foundation.

Encoding in Base64: E-mail is one of the earliest Internet tools.  It hearkens back to an era when only the most basic alphabets could be transmitted using a venerable character encoding standard called ASCII (pronounced ASK-KEY and short for American Standard Code for Information Interchange).  How do you get binary data like photos to transit a system that only understands a 128-character alphabet?  Easy!  You convert the binary numbers to numbers expressed more efficiently as 64 ASCII characters, to wit, the 26 lowercase letters of the alphabet, the 26 uppercase letters, numbers zero through nine and two punctuation marks (forward slash/ and plus sign+).  That’s 64 characters, each representing a unique numeric value that can replace six bits of binary data.  So, 24 bits of data can be written using just four base64 characters. Base64 looks like this:

It may not look like much, but it’s a feat of reductive technology we all use every day.

Looking at our conversion events when metadata might be lost, we have:

  • HEIC to JPEG
  • JPEG to Base64
  • Base64 to JPEG

Coding in and out of Base64 shouldn’t change a thing, but we can’t rule out anything yet.

Is that all?  Nope!

Photos often change without acquiring a new format.  If you’ve attached a photo to an e-mail and were asked whether you want the attachment to be small, medium, large or original size, any choice but the last one effects big changes to content. Perhaps scaling a photo poses a risk that embedded EXIF metadata will be lost?

When the defense sought the missing metadata, the legal assistant went to the plaintiff, who supplied a screenshot showing that the HEIC photos he’d sent went out carrying the full complement of EXIF metadata.  I asked the legal assistant for a copy of what she’d produced to the defendant and confirmed the embedded EXIF data was, in fact, gone, gone, gone.

Coming back to “did Apple strip out the EXIF metadata from the HEIC photos when it converted them to JPEGs?”  I took an HEIC photo with my iPhone and e-mailed it to my Gmail account as an attachment.  The attachment was converted to a JPG but retained its EXIF data when saved to disk. I re-sent it as a downscaled image and all the EXIF remained intact.  Finally, I sent it as an inline image and saved the received image to disk.  Poof!  The metadata vanishes!  Now, we’re getting somewhere.

I asked the legal assistant to forward a copy of the e-mail she’d received from the client transmitting the photos.  As expected, the photos weren’t in HEIC format but had been converted to JPEGs.  Notably, they were inline photos displayed in the body of the e-mail instead of as attachments.  When I saved the inline images to disk, the EXIF data was gone.

Undeterred, I saved the forwarded message to disk as an .eml message and opened it in Microsoft Notepad.  Scrolling down to check the Base64 encoded content, I copied the Base64 of a single image and converted it to a JPEG photo.  Happily, the photo I recovered held its full complement of EXIF data.  I could only conclude that saving an inline photo to disk by right clicking and choosing “Save Image as” was the culprit.  Had the photos been made attachments instead of inline images, their EXIF data would have remained in the file saved to disk.

But the revelation was that the EXIF data sought was present in the JPEG images, even if it couldn’t be pulled out by clicking on them as inline images and saving the image to disk.  This was true in both Gmail and Outlook.

Now, I have a forensics lab thrumming with workstations and ingenious software, but what’s a legal assistant supposed to do, MacGyver-like, with just the tools at hand?  Having solved the puzzle of what went wrong, the bonus puzzle was figuring out how to fix it.

Here’s a simple workaround I came up with that performed splendidly:

1. Create an empty folder on your Windows Desktop called “Inline Images.”

2. In Microsoft Outlook, open the message holding the inline photos you want to extract. 

3. From the Outlook message menu bar select File>Save As then chose Save as Type>HTML (*.htm, *.html) and save the message to your “Inline Images” folder.

4. Open the “Inline Images” folder and locate the subfolder named [subject of the transmitting message]_Files.  Open this folder and you’ll find copies of each inline photo.  If you find two copies of each, small and large, the small copy is a thumbnail lacking EXIF data but the full-size version will have all EXIF metadata intact.  Voila!  We go from The Metadata Vanishes to Return of the Metadata.

I’d prefer clients e-mail photos by transmitting them inside a compressed Zip file rather than forwarding them as inline images or attachments.  The Zip container better protects the integrity of the evidence and forestalls stripping or alteration of metadata.  Plus, a Zip container can be encrypted for superior cybersecurity. 

Have you run into this before, Dear Reader?  Do you know a simpler way to get inline images out of parent messages without corrupting metadata or hiring an expert?  If so, please leave a comment.

Gayle McCormick O’Connor

I’m sad to share that Gayle McCormick O’Connor died suddenly in her sleep on Sunday, October 18, 2020.  Gayle was a longtime legal technology marketing specialist. If you hadn’t the luck to know her, you doublessly noticed her at any of dozens of legal tech conferences.  Gayle was a star, truly a nova who shone brightly and cared passionately about her colleagues, friends and, above all, her son, Seamus and husband, Tom O’Connor–e-discovery and LegalTech thought leader and my dear friend.

Gayle’s luminosity was no accident.  She worked at it.  Gayle turned heads.  Gayle danced in front of the band.  She dressed to the nines in footwear that defied gravity and description.  Gayle was a canvas for her art and her art was celebration.  I’ve seen Gayle spin atop the bar in a packed Vegas biker bar.  I’ve sung Viva Las Vegas with Tom and Gayle at the top of our lungs beside the Bellagio fountains.  Just a month ago, Gayle sang Happy Birthday to me in her unforgettable Marilyn Monroe-Seducing-John-Kennedy style. Gayle was a seductress.  She unapologetically called herself “Cougar.”  Gayle’s tales of trysts with rock stars in the 70’s could have inspired Almost Famous.  Gayle lived large and loved large.  The public Gayle was sensational.  She defied age to change her, and age demurred.

Yet, there was a private side to Gayle; a sweet, maternal aspect that appeared when the makeup and stilettos came off and she was tired, sore and having a cigarette on the porch.  This Gayle deeply missed her mothers, who died in quick succession less than two years ago.  This Gayle was frustrated by COVID and pained by politics.  She found solace in her family and friends.  Gayle was unendingly proud of Seamus, always her deep well of joy.  I’m certain that no wife was ever more supported and adored by her husband than Gayle O’Connor was by Tom O’Connor.  Gayle was Tom’s prize.  He worshipped her, and she him.  After thirty-one years of marriage on their own terms, Tom’s eyes still sparkled when Gayle was the center of attention.  He was so proud to be her husband.  Who wouldn’t envy them?  They were blessed to have each other.  They should have had more time, for themselves, and for all of us who love them.

Tom and Gayle.  Saying that is like saying “red beans and rice.”  Perfect together.  I cannot believe she’s gone, but she lives on in the many happy stories we will tell of her and the light she brought to our lives.

The Irish have a blessing for the departed that seems right for a McCormick O’Connor: “May the road rise to meet you, and the wind always be at your back. May the sun shine warm on your face and the rains fall softly on your fields. And until we meet again, may God hold you gently in the palm of his hand.”

If there’s a rock-and-roll heaven, you’ll find Gayle right up front with the band. She’ll be dancing and all eyes will be on her.

Tom is planning a musical celebration of Gayle’s life for 3:30pm Saturday afternoon, October 24 at Bayou St. John. Here are the details in Tom’s words:

“For anyone in New Orleans, on Sat Oct 24 we’re going to do a New Orleans celebration of Gayle’s life at her place, 3234 Grand Route St.John. (that’s between Moss St and the Esplanade) We’ll walk down to the Bayou (half a block), I’ll say a (very) few words, have a tribute song from her friends Maggie & Kess, scatter some ashes, then go back to her house for some music on the porch from 4 to 6 by Glenn David Andrews, one of Gayle’s favorite New Orleans musicians.”

“We’re still in COVID-19 restrictions folks so wear masks and socially distance in order to allow the celebration to go off without a hitch. If you bring flowers, please bring yellow flowers …it was her favorite color.”

“Donations can be made in her name to the Grace House residential treatment center for women in New Orleans at https://www.bridgehouse.org/support-us/donate/“

A sweet remembrance from Bob Ambrogi.

The Case for Native, I Swear

Regular readers may tire of my extolling the virtues of native forms of production; but battleships turn slowly, and this one must yet be turned. Apart from judges (whose frequent unfamiliarity with electronic evidence makes them easy prey for prevarication), those best situated to end the ruin of TIFF+ productions are those who profit most from doing nothing.

Articles, speeches and blog posts can only go so far. What’s needed are published judicial decisions. Whether they go one way or the other, we need thoughtful opinions that lay out the issues in an accurate and balanced way, informing litigants what’s at stake. Many published orders fail to weigh the genuine pros and cons of each form of production. A few read as if TIFF images were the evidence and requesting parties were seeking to have God-given TIFF images converted into heretical native files. Talk about confused!

Seeking another published opinion on the merits of native production, I recently supplied a declaration to a federal court. I’m attaching an anonymized version of my testimony in the hope that readers will weigh the arguments. I concede “it ain’t Shakespeare,” but it’s honest. I changed a lot to make it difficult to identify the matter, although the Declaration is a matter of public record. Sorry, but I thought a little less candor would be the wiser path. The lawsuit is still very much in contention.

Here’s the anonymized version in PDF:

Here’s the guts of it:

Continue reading

The Perfect Preservation Letter: A New Guide

Yesterday, I asked my Electronic Evidence class at Tulane Law School, “What’s the difference between a preservation letter and a legal hold notice?”

Do you know?

I got the simple answer I sought: You put your clients on notice of legal hold; you send a preservation letter to the other side.  Another difference is that there is no legal duty to dispatch a preservation letter, but woe betide the lawyer who fails to initiate a prompt and proper litigation hold! 

In truth, the two missives have much in common.  Both seek the preservation of evidence, and both are best when clear, specific and instructive.  Both must go out when you know less than you’d like about sources of potentially responsive information.  Finally, both tend to receive minimal thought before dissemination, resulting in easily ignored, boilerplate forms crowding out artfully-targeted requests.

If I’m frank, most of what passed for preservation letters “back in the day” were, well, crap.  They sprang from forensic service providers and sounded more like ransom notes than statements of a practical and proportionate legal duty.  Literal compliance required pulling the plugs on the computers and backing away…very…very…slowly.  But, with the first 2006 amendments to the Federal Rules of Civil Procedure came a groundswell to routinize e-discovery, to label its stages (as in the iconic EDRM diagram) and to systemize its execution by development of “defensible, repeatable processes.”  So, way back when, I wrote an article introducing requesting parties to the “perfect” preservation letter and offering an example as a drafting aid.  Perhaps because it was the only lifeboat in a storm, it took off; and it wasn’t long before lawyers on the north side of the docket made it their favorite opening salvo. 

If that sounds like bragging, know that I’m not proud of what happened.  People started using the exemplar “perfect” letter in the lazy way I hoped they wouldn’t:  as a form pitched at cases of every stripe and type. 

Hey folks. “Perfect” was tongue-in-cheek!  I wrote,

You won’t find the perfect preservation letter in any formbook. You must custom craft it from a judicious mix of clear, technically astute terminology and fact-specific direction. It compels broad retention while asking for no more than the essentials. It rings with reasonableness. Its demands are proportionate to the needs of the case, and it keeps the focus of e-discovery where it belongs: on relevance.

But no one read that.  It was just too easy to hand the example over to an assistant and say, “send this out in all our cases.”

Fast forward to 2018 and counsel to the President of the United States sends out my letter without updating it to reflect any of the changes we’ve seen in sources and forms of electronically stored information since, say, Hurricane Katrina.  Imagine a preservation letter from President Trump that ignores tweets, for goodness sake!  Clearly, the article and the accompanying exemplar letter both needed more than a fresh coat of paint.  Weirdly, the gap hadn’t been filled by anything else in fifteen years.

A few weeks back, I updated and published the exemplar letter, with a fresh plea to use it as a drafting aid and not as a form.  Today, I finished updating the guide to its use, once again called (IRONICALLY) The Perfect Preservation Letter.  It’s still no masterpiece. To be useful, the letter must be a living document, changing to reflect new sources (Dating sites! I forgot to add dating sites!) and improved ways to preserve and acquire evidence. I hope a new generation of lawyers finds it instructive.  There’s plenty of room for improvement, so dig in, make it better, make it your own.

Advanced Zoom “Weather Map” Technique

I lately presented a program for the State Bar of Texas Annual Meeting alongside Texas District Court Judge Emily Miskel. Like everything else, the venerable Annual Meeting was recast as a virtual event. Our topic was “Upping your Game in Zoom,” and we spoke of many ways to improve the quality of online video meetings and hearings. Judge Miskel and I covered dead simple ways to avoid common errors and some advanced techniques. One advanced approach I shared was making your presentation visuals serve as your dynamic Zoom background, enabling a presenter to interact with background visuals in the same way that TV meteorologists explain weather patterns using a green screen map.

There are times when a disembodied narration of screen-filling visuals is best; yet, there are times when you don’t want to force viewers to choose between speakers and visuals, as occurs when Zoom attendees lack the screen real estate or mastery of the Zoom interface needed to pin speakers to larger windows. Let’s face it: most Zoom users are overwhelmed by mute/unmute; asking them to pin and resize screens is a bridge too far.

Certainly, anyone can share a PowerPoint presentation in Zoom, bringing slide imagery to the fore and relegating speakers to tiny squares at the perimeter, like the world’s saddest episode of The Brady Bunch. Instead, I wanted to be a more prominent part of the show, akin to the accustomed ways speakers present onstage.

Television news anchors routinely uses “OTS” (for over-the-shoulder) graphics as an effective segue between the newsreader and story video. OTS graphics work nicely in Zoom, introducing the topic or bullet points in a background slide, then sharing out the focal graphics. It sounds complicated, but it’s easy to get the hang of going to and returning from shared screens. It takes practice, but isn’t practice always key to improving presentation skill?

PowerPoint does all the heavy lifting of converting your slide visuals to still images (and even to video) suitable for use as Zoom backgrounds. Any PowerPoint slide show can be saved as individual JPG or PNG graphics. The “trick” is to compose the slide to afford room for the presenter’s upper torso without obstructing the visuals.

If you look at the two images below, you can see that I’ve left vacant the lower right quadrant of each slide. This presentation required use of templates, but left to my own aesthetics, I never use templates.

I hate ugly templates!
Reserve part of the screen for your image. Don’t block your bullets!

In practice, I adjust my camera such that my head and shoulders occupy the lower right of the Zoom screen (see below), then I can point at bullets and gesture at graphics. The weathercaster technique really shines when you present standing up. Then, you’d devote one-half to one-third of the slide layout to your graphics and the balance to you. You could even stand between two columns of bullets, Of course, this requires sufficient room between camera and green screen and, ideally, a dedicated camera and studio lighting.

Would it hurt to smile?

By now, you’ve gathered that achieving a true chroma key effect requires a physical green screen backdrop, not the virtual “where’d my ears go?” background effect often seen. A suitable 9-10′ muslin green screen backdrop will cost about twenty dollars on Amazon. I elected to spend more and get the green screen, crossbar, pair of backdrop supports and a bevy of studio lights and stands for $150.00. If you’ve got a way to hang a green sheet behind you (e.g., curtain rod, tacked to a wall, hung from the ceiling), that twenty dollar backdrop works just fine.

Home Studio Kit

Having created your background visuals and saved each slide as a still JPG or PNG image, you’ll load them into Zoom as Virtual Backgrounds. To do so, start Zoom and go to Virtual Background in the Settings menu. Locate and click the the small plus sign (+) (Arrow 1, below), then click on “Add Image” from the menu and navigate to where you’ve saved your background images. Add each image in this manner, keeping them in the order in which you want them displayed when presenting. Next, click the box to tell Zoom you have a green screen (Arrow 2), and finally, be sure the color shown matches your backdrop. Zoom should do this automatically, but you can also set it manually (Arrow 3).

Zoom’s Settings>Virtual Background screen

You’re ready to go, but before starting a presentation, launch Zoom and Virtual Background again. Practice selecting each background much as you might advance them as slides in a PowerPoint show, choosing them in succession while presenting. If you’ve loaded them in your preferred order, they will appear as options in that order. You will need to keep the Virtual Background settings panel open at all times during your presentation, so a second screen helps insure the settings panel doesn’t disappear behind another window. You don’t want to be fumbling around in search of the Virtual Backgrounds panel while speaking.

The Weather Map Technique is harder to describe than it is to pull off. The key to keeping it smooth and simple calls to mind the out-of-towner visiting Manhattan who asked a local, “How do I get to Carnegie Hall?”

The answer’s the same: “Practice, practice, practice!”

Wish List: I look forward to a day when Zoom natively supports dynamic backgrounds allowing us to feed PowerPoints directly to a background instead of a shared screen. Also, I’d like to be able to folder backgrounds topically. Affording hosts greater control over the layout of Zoom windows would be nice. In Zoom hearings, think how it would help to be able to group lawyers according to their role in the litigation.

It’s About Time!

“Time heals all wounds.”  “Time is money.” “Time flies.” 

To these memorable mots, I add one more: “Time is truth.”

A defining feature of electronic evidence is its connection to temporal metadata or timestamps.  Electronically stored information is frequently described by time metadata denoting when ESI was created, modified, accessed, transmitted, or received.  Clues to time are clues to truth because temporal metadata helps establish and refute authenticity, accuracy, and relevancy.

But in the realms of electronic evidence and digital forensics, time is tricky.  It hides in peculiar places, takes freakish forms, and doesn’t always mean what we imagine.  Because time is truth, it’s valuable to know where to find temporal clues and how to interpret them correctly.

Everyone who works with electronic evidence understands that files stored in a Windows (NTFS) environment are paired with so-called “MAC times,” which have nothing to do with Apple Mac computers or even the MAC address identifying a machine on a network.  In the context of time, MAC is an initialization for Modified, Accessed and Created times.

That doesn’t sound tricky.  Modified means changed, accessed means opened and created means authored, right?  Wrong.  A file’s modified time can change due to actions neither discernible to a user nor reflective of user-contributed edits.  Accessed times change from events (like a virus scan) that most wouldn’t regard as accesses. Moreover, Windows stopped reliably updating file access times way back in 2007 when it introduced the Windows Vista operating system.  Created may coincide with the date a file is authored, but it’s as likely to flow from the copying of the file to new locations and storage media (“created” meaning created in that location). Copying a file in Windows produces an object that appears to have been created after it’s been modified!

it’s crucial to protect the integrity of metadata in e-discovery, so changing file creation times by copying is a big no-no.  Accordingly, e-discovery collection and processing tools perform the nifty trick of changing MAC times on copies to match times on the files copied.  Thus, targeted collection alters every file collected, but done correctly, original metadata values are restored and hash values don’t change.  Remember: system metadata values aren’t stored within the file they describe so system metadata values aren’t included in the calculation of a file’s hash value.  The upshot is that changing a file’s system metadata values—including its filename and MAC times—doesn’t affect the file’s hash value. 

Conversely and ironically, opening a Microsoft Word document without making a change to the file’s contents can change the file’s hash value when the application updates internal metadata like the editing clock.  Yes, there’s even a timekeeping feature in Office applications!

Other tricky aspects of MAC times arise from the fact that time means nothing without place.  When we raise our glasses with the justification, “It’s five o’clock somewhere,” we are acknowledging that time is a ground truth. “Time” means time in a time zone, adjusted for daylight savings and expressed as a UTC Offset stating the number of time zones ahead of or behind GMT, time at the Royal Observatory in Greenwich, England atop the Prime or “zero” Meridian.

Time values of computer files are typically stored in UTC, for Coordinated Universal Time, essentially Greenwich Mean Time (GMT) and sometimes called Zulu or “Z” time, military shorthand for zero meridian time.  When stored times are displayed, they are adjusted by the computer’s operating system to conform to the user’s local time zone and daylight savings time rules.  So in e-discovery and computer forensics, it’s essential to know if a time value is a local time value adjusted for the location and settings of the system or if it’s a UTC value.  The latter is preferred in e-discovery because it enables time normalization of data and communications, supporting the ability to order data from different locales and sources across a uniform timeline.

Four months of pandemic isolation have me thinking about time.  Lost time. Wasted time. Pondering where the time goes in lockdown.   Lately, I had to testify about time in a case involving discovery malfeasance and corruption of time values stemming from poor evidence handling.  When time values are absent or untrustworthy, forensic examiners draw on hidden time values—or, more accurately, encoded time values—to construct timelines or reveal forgeries.

Time values are especially important to the reliable ordering of email communications.  Most e-mails are conversational threads, often a mishmash of “live” messages (with their rich complement of header data, encoded attachments and metadata) and embedded text strings of older messages.  If the senders and receivers occupy different time zones, the timeline suffers: replies precede messages that prompted them, and embedded text strings make it child’s play to alter times and text.  It’s just one more reason I always seek production of e-mail evidence in native and near-native forms, not as static images.  Mail headers hold data that support authenticity and integrity—data you’ll never see produced in a load file.

Underscoring that last point, I’ll close with a wacky, wonderful example of hidden timestamps: time values embedded in Gmail boundaries.  This’ll blow your mind.

If you know where to look in digital evidence, you’ll find time values hidden like Easter eggs. 

E-mail must adhere to structural conventions to traverse the internet and be understood by different e-mail programs. One of these conventions is the use of a Content-Type declaration and setting of content boundaries, enabling systems to distinguish the message header region from the message body and attachment regions.

The next illustration is a snippet of simplified code from a forged Gmail message.  To see the underlying code of a Gmail message, users can select “Show original” from the message options drop-down menu (i.e., the ‘three dots’).

The line partly outlined in red advises that the message will be “multipart/alternative,” indicating that there will be multiple versions of the content supplied; commonly a plain text version followed by an HTML version. To prevent confusion of the boundary designator with message text, a complex sequence of characters is generated to serve as the content boundary. The boundary is declared to be “00000000000063770305a4a90212” and delineates a transition from the header to the plain text version (shown) to the HTML version that follows (not shown).

Thus, a boundary’s sole raison d’être is to separate parts of an e-mail; but because a boundary must be unique to serve its purpose, programmers insure against collision with message text by integrating time data into the boundary text.  Now, watch how we decode that time data.

Here’s our boundary, and I’ve highlighted fourteen hexadecimal characters in red:

Next, I’ve parsed the highlighted text into six- and eight-character strings, reversed their order and concatenated the strings to create a new hexadecimal number:

A decimal number is Base 10.  A hexadecimal number is Base 16.  They are merely different ways of notating numeric values.  So, 05a4a902637703 is just a really big number. If we convert it to its decimal value, it becomes: 1,588,420,680,054,531.  That’s 1 quadrillion, 588 trillion, 420 billion, 680 million, 54 thousand, 531.  Like I said, a BIG number.

But, a big number…of what?

Here’s where it gets amazing (or borderline insane, depending on your point of view).

It’s the number of microseconds that have elapsed since January 1, 1970 (midnight UTC), not counting leap seconds. A microsecond is a millionth of a second, and 1/1/1970 is the “Epoch Date” for the Unix operating system. An Epoch Date is the date from which a computer measures system time. Some systems resolve the Unix timestamp to seconds (10-digits), milliseconds (13-digits) or microseconds (16-digits).

When you make that curious calculation, the resulting date proves to be Saturday, May 2, 2020 6:58:00.054 AM UTC-05:00 DST.  That’s the genuine date and time the forged message was sent.  It’s not magic; it’s just math.

Had the timestamp been created by the Windows operating system, the number would signify the number of 100 nanosecond intervals between midnight (UTC) on January 1, 1601 and the precise time the message was sent.

Why January 1, 1601?  Because that’s the “Epoch Date” for Microsoft Windows.  Again, an Epoch Date is the date from which a computer measures system time.  Unix and POSIX measure time in seconds from January 1, 1970.  Apple used one second intervals since January 1, 1904, and MS-DOS used seconds since January 1, 1980. Windows went with 1/1/1601 because, when the Windows operating system was being designed, we were in the first 400-year cycle of the Gregorian calendar (implemented in 1582 to replace the Julian calendar). Rounding up to the start of the first full century of the 400-year cycle made the math cleaner.

Timestamps are everywhere in e-mail, hiding in plain sight.  You’ll find them in boundaries, message IDs, DKIM stamps and SMTP IDs.  Each server handoff adds its own timestamp.  It’s the rare e-mail forger who will find every embedded timestamp and correctly modify them all to conceal the forgery. 

When e-mail is produced in its native and near-native forms, there’s more there than meets the eye in terms of the ability to generate reliable timelines and flush out forgeries and excised threads.  Next time the e-mail you receive in discovery seems “off” and your opponent balks at giving you suspicious e-mail evidence in faithful electronic formats, ask yourself: What are they trying to hide?

The takeaway is this: Time is truth and timestamps are evidence in their own right.  Isn’t it about time we stop letting opponents strip it away?

Tip of the hat to Arman Gungor at Metaspike whose two excellent articles about e-mail timestamp forensics reminded me how much I love this stuff.  https://www.metaspike.com/timestamps-forensic-email-examination/