I love solving puzzles. I come by it honestly. My late mother was a nationally ranked New York Times crossword puzzler, and though I lack her prodigious gifts, I start each morning racing on the Times crossword. I mention puzzling to note that the best part of my forensics work is finding the answer to electronic evidence puzzles. This week’s challenge comes from a legal assistant caught between a rock and a hard place, actually between the plaintiff and defense counsel. The defense objected that photos produced in discovery lacked metadata, while the plaintiff insisted the photos he had furnished contained the “missing” metadata. How could they both be right? The mystified legal assistant had simply saved the photos from the transmitting message and sent them on to the other side. She hadn’t removed any metadata. Or had she?
I had to figure out what happened and keep it from happening again.
First, some technical underpinnings:
What do we mean by metadata? Digital photos, particularly those taken with cell phone cameras, hold more information than shows up in the pretty pictures. Stored within the photos is a type of application metadata called EXIF (for Exchangeable Image File Format). EXIF holds camera settings, including the make and model of the camera or phone, time and date information, geolocation coordinates and more. Because it’s application metadata, it’s content stored within the file and moves with the file when copied or transmitted…unless someone or something makes it disappear.
There’s a second sort of metadata called system metadata, It’s context; data about the file that’s stored without the file, typically in the system’s file table that serves as a directory of electronically stored information. System metadata includes such things as a file’s name, location, modified and created dates and more. Because it’s stored outside a file, it doesn’t move with the file but must be rounded up when a file is copied or transmitted. Precious little system metadata follows a file when it’s e-mailed, often just the file’s name, size and type (although Apple systems include the file’s last modified and created dates).
The defense was seeing dates and times for photos that did not line up with the actual dates and times the photos were taken. Too, the camera and geolocation data that should have been in the EXIF segments of the pictures were gone when plaintiffs produced them.
Picture formats and EXIF metadata: The photos produced were taken with an iPhone and stored on a Mac computer. When most of us think of digital photos, we probably think of JPEG images stored as files with the extension .JPG. The JPEG photo format has been around for almost thirty years and been the most common format for much of that time. JPEG is what’s termed “lossy compression” referring to its ability to make image files smaller in size by jettisoning parts of the image that contribute to resolution and detail. The more tightly you compress a JPEG image (and the more often you do it), the “jaggier” and more distorted the image becomes.
As digital cameras have improved, digital photographs have grown larger in size, eating up storage space. Two-thirds of the data on my iPhone are photographs. Seeking a more efficient way to store images and video, Apple started phasing out JPEG images in 2017. The replacement was a format called High Efficiency Image File Format which, as implemented by Apple, photos are stored as High-Efficiency Image Containers with the file extension .HEIC.
The benefit is that, for comparable image quality, HEIC images are roughly half the size of JPEG images, and they hold EXIF data. The downside is that most of the world still expects a picture to be a JPEG and the Windows and Cloud realms need time to catch up. To remain compatible with other devices and operating systems, Apple converts HEIC images to JPEGs for sharing via e-mail.
Now, there’s something to consider! Did Apple strip out the EXIF metadata from the HEIC photos when it converted them to JPEGs? Hold that thought while I lay a little more foundation.
Encoding in Base64: E-mail is one of the earliest Internet tools. It hearkens back to an era when only the most basic alphabets could be transmitted using a venerable character encoding standard called ASCII (pronounced ASK-KEY and short for American Standard Code for Information Interchange). How do you get binary data like photos to transit a system that only understands a 128-character alphabet? Easy! You convert the binary numbers to numbers expressed more efficiently as 64 ASCII characters, to wit, the 26 lowercase letters of the alphabet, the 26 uppercase letters, numbers zero through nine and two punctuation marks (forward slash/ and plus sign+). That’s 64 characters, each representing a unique numeric value that can replace six bits of binary data. So, 24 bits of data can be written using just four base64 characters. Base64 looks like this:
Looking at our conversion events when metadata might be lost, we have:
- HEIC to JPEG
- JPEG to Base64
- Base64 to JPEG
Coding in and out of Base64 shouldn’t change a thing, but we can’t rule out anything yet.
Is that all? Nope!
Photos often change without acquiring a new format. If you’ve attached a photo to an e-mail and were asked whether you want the attachment to be small, medium, large or original size, any choice but the last one effects big changes to content. Perhaps scaling a photo poses a risk that embedded EXIF metadata will be lost?
When the defense sought the missing metadata, the legal assistant went to the plaintiff, who supplied a screenshot showing that the HEIC photos he’d sent went out carrying the full complement of EXIF metadata. I asked the legal assistant for a copy of what she’d produced to the defendant and confirmed the embedded EXIF data was, in fact, gone, gone, gone.
Coming back to “did Apple strip out the EXIF metadata from the HEIC photos when it converted them to JPEGs?” I took an HEIC photo with my iPhone and e-mailed it to my Gmail account as an attachment. The attachment was converted to a JPG but retained its EXIF data when saved to disk. I re-sent it as a downscaled image and all the EXIF remained intact. Finally, I sent it as an inline image and saved the received image to disk. Poof! The metadata vanishes! Now, we’re getting somewhere.
I asked the legal assistant to forward a copy of the e-mail she’d received from the client transmitting the photos. As expected, the photos weren’t in HEIC format but had been converted to JPEGs. Notably, they were inline photos displayed in the body of the e-mail instead of as attachments. When I saved the inline images to disk, the EXIF data was gone.
Undeterred, I saved the forwarded message to disk as an .eml message and opened it in Microsoft Notepad. Scrolling down to check the Base64 encoded content, I copied the Base64 of a single image and converted it to a JPEG photo. Happily, the photo I recovered held its full complement of EXIF data. I could only conclude that saving an inline photo to disk by right clicking and choosing “Save Image as” was the culprit. Had the photos been made attachments instead of inline images, their EXIF data would have remained in the file saved to disk.
But the revelation was that the EXIF data sought was present in the JPEG images, even if it couldn’t be pulled out by clicking on them as inline images and saving the image to disk. This was true in both Gmail and Outlook.
Now, I have a forensics lab thrumming with workstations and ingenious software, but what’s a legal assistant supposed to do, MacGyver-like, with just the tools at hand? Having solved the puzzle of what went wrong, the bonus puzzle was figuring out how to fix it.
Here’s a simple workaround I came up with that performed splendidly:
1. Create an empty folder on your Windows Desktop called “Inline Images.”
2. In Microsoft Outlook, open the message holding the inline photos you want to extract.
3. From the Outlook message menu bar select File>Save As then chose Save as Type>HTML (*.htm, *.html) and save the message to your “Inline Images” folder.
4. Open the “Inline Images” folder and locate the subfolder named [subject of the transmitting message]_Files. Open this folder and you’ll find copies of each inline photo. If you find two copies of each, small and large, the small copy is a thumbnail lacking EXIF data but the full-size version will have all EXIF metadata intact. Voila! We go from The Metadata Vanishes to Return of the Metadata.
I’d prefer clients e-mail photos by transmitting them inside a compressed Zip file rather than forwarding them as inline images or attachments. The Zip container better protects the integrity of the evidence and forestalls stripping or alteration of metadata. Plus, a Zip container can be encrypted for superior cybersecurity.
Have you run into this before, Dear Reader? Do you know a simpler way to get inline images out of parent messages without corrupting metadata or hiring an expert? If so, please leave a comment.