“Time heals all wounds.” “Time is money.” “Time flies.”
To these memorable mots, I add one more: “Time is truth.”
A defining feature of electronic evidence is its connection to temporal metadata or timestamps. Electronically stored information is frequently described by time metadata denoting when ESI was created, modified, accessed, transmitted, or received. Clues to time are clues to truth because temporal metadata helps establish and refute authenticity, accuracy, and relevancy.
But in the realms of electronic evidence and digital forensics, time is tricky. It hides in peculiar places, takes freakish forms, and doesn’t always mean what we imagine. Because time is truth, it’s valuable to know where to find temporal clues and how to interpret them correctly.
Everyone who works with electronic evidence understands that files stored in a Windows (NTFS) environment are paired with so-called “MAC times,” which have nothing to do with Apple Mac computers or even the MAC address identifying a machine on a network. In the context of time, MAC is an initialization for Modified, Accessed and Created times.
That doesn’t sound tricky. Modified means changed, accessed means opened and created means authored, right? Wrong. A file’s modified time can change due to actions neither discernible to a user nor reflective of user-contributed edits. Accessed times change from events (like a virus scan) that most wouldn’t regard as accesses. Moreover, Windows stopped reliably updating file access times way back in 2007 when it introduced the Windows Vista operating system. Created may coincide with the date a file is authored, but it’s as likely to flow from the copying of the file to new locations and storage media (“created” meaning created in that location). Copying a file in Windows produces an object that appears to have been created after it’s been modified!
it’s crucial to protect the integrity of metadata in e-discovery, so changing file creation times by copying is a big no-no. Accordingly, e-discovery collection and processing tools perform the nifty trick of changing MAC times on copies to match times on the files copied. Thus, targeted collection alters every file collected, but done correctly, original metadata values are restored and hash values don’t change. Remember: system metadata values aren’t stored within the file they describe so system metadata values aren’t included in the calculation of a file’s hash value. The upshot is that changing a file’s system metadata values—including its filename and MAC times—doesn’t affect the file’s hash value.
Conversely and ironically, opening a Microsoft Word document without making a change to the file’s contents can change the file’s hash value when the application updates internal metadata like the editing clock. Yes, there’s even a timekeeping feature in Office applications!
Other tricky aspects of MAC times arise from the fact that time means nothing without place. When we raise our glasses with the justification, “It’s five o’clock somewhere,” we are acknowledging that time is a ground truth. “Time” means time in a time zone, adjusted for daylight savings and expressed as a UTC Offset stating the number of time zones ahead of or behind GMT, time at the Royal Observatory in Greenwich, England atop the Prime or “zero” Meridian.
Time values of computer files are typically stored in UTC, for Coordinated Universal Time, essentially Greenwich Mean Time (GMT) and sometimes called Zulu or “Z” time, military shorthand for zero meridian time. When stored times are displayed, they are adjusted by the computer’s operating system to conform to the user’s local time zone and daylight savings time rules. So in e-discovery and computer forensics, it’s essential to know if a time value is a local time value adjusted for the location and settings of the system or if it’s a UTC value. The latter is preferred in e-discovery because it enables time normalization of data and communications, supporting the ability to order data from different locales and sources across a uniform timeline.
Four months of pandemic isolation have me thinking about time. Lost time. Wasted time. Pondering where the time goes in lockdown. Lately, I had to testify about time in a case involving discovery malfeasance and corruption of time values stemming from poor evidence handling. When time values are absent or untrustworthy, forensic examiners draw on hidden time values—or, more accurately, encoded time values—to construct timelines or reveal forgeries.
Time values are especially important to the reliable ordering of email communications. Most e-mails are conversational threads, often a mishmash of “live” messages (with their rich complement of header data, encoded attachments and metadata) and embedded text strings of older messages. If the senders and receivers occupy different time zones, the timeline suffers: replies precede messages that prompted them, and embedded text strings make it child’s play to alter times and text. It’s just one more reason I always seek production of e-mail evidence in native and near-native forms, not as static images. Mail headers hold data that support authenticity and integrity—data you’ll never see produced in a load file.
Underscoring that last point, I’ll close with a wacky, wonderful example of hidden timestamps: time values embedded in Gmail boundaries. This’ll blow your mind.
If you know where to look in digital evidence, you’ll find time values hidden like Easter eggs.
E-mail must adhere to structural conventions to traverse the internet and be understood by different e-mail programs. One of these conventions is the use of a Content-Type declaration and setting of content boundaries, enabling systems to distinguish the message header region from the message body and attachment regions.
The next illustration is a snippet of simplified code from a forged Gmail message. To see the underlying code of a Gmail message, users can select “Show original” from the message options drop-down menu (i.e., the ‘three dots’).
The line partly outlined in red advises that the message will be “multipart/alternative,” indicating that there will be multiple versions of the content supplied; commonly a plain text version followed by an HTML version. To prevent confusion of the boundary designator with message text, a complex sequence of characters is generated to serve as the content boundary. The boundary is declared to be “00000000000063770305a4a90212” and delineates a transition from the header to the plain text version (shown) to the HTML version that follows (not shown).
Thus, a boundary’s sole raison d’être is to separate parts of an e-mail; but because a boundary must be unique to serve its purpose, programmers insure against collision with message text by integrating time data into the boundary text. Now, watch how we decode that time data.
Here’s our boundary, and I’ve highlighted fourteen hexadecimal characters in red:
Next, I’ve parsed the highlighted text into six- and eight-character strings, reversed their order and concatenated the strings to create a new hexadecimal number:
A decimal number is Base 10. A hexadecimal number is Base 16. They are merely different ways of notating numeric values. So, 05a4a902637703 is just a really big number. If we convert it to its decimal value, it becomes: 1,588,420,680,054,531. That’s 1 quadrillion, 588 trillion, 420 billion, 680 million, 54 thousand, 531. Like I said, a BIG number.
But, a big number…of what?
Here’s where it gets amazing (or borderline insane, depending on your point of view).
It’s the number of microseconds that have elapsed since January 1, 1970 (midnight UTC), not counting leap seconds. A microsecond is a millionth of a second, and 1/1/1970 is the “Epoch Date” for the Unix operating system. An Epoch Date is the date from which a computer measures system time. Some systems resolve the Unix timestamp to seconds (10-digits), milliseconds (13-digits) or microseconds (16-digits).
When you make that curious calculation, the resulting date proves to be Saturday, May 2, 2020 6:58:00.054 AM UTC-05:00 DST. That’s the genuine date and time the forged message was sent. It’s not magic; it’s just math.
Had the timestamp been created by the Windows operating system, the number would signify the number of 100 nanosecond intervals between midnight (UTC) on January 1, 1601 and the precise time the message was sent.
Why January 1, 1601? Because that’s the “Epoch Date” for Microsoft Windows. Again, an Epoch Date is the date from which a computer measures system time. Unix and POSIX measure time in seconds from January 1, 1970. Apple used one second intervals since January 1, 1904, and MS-DOS used seconds since January 1, 1980. Windows went with 1/1/1601 because, when the Windows operating system was being designed, we were in the first 400-year cycle of the Gregorian calendar (implemented in 1582 to replace the Julian calendar). Rounding up to the start of the first full century of the 400-year cycle made the math cleaner.
Timestamps are everywhere in e-mail, hiding in plain sight. You’ll find them in boundaries, message IDs, DKIM stamps and SMTP IDs. Each server handoff adds its own timestamp. It’s the rare e-mail forger who will find every embedded timestamp and correctly modify them all to conceal the forgery.
When e-mail is produced in its native and near-native forms, there’s more there than meets the eye in terms of the ability to generate reliable timelines and flush out forgeries and excised threads. Next time the e-mail you receive in discovery seems “off” and your opponent balks at giving you suspicious e-mail evidence in faithful electronic formats, ask yourself: What are they trying to hide?
The takeaway is this: Time is truth and timestamps are evidence in their own right. Isn’t it about time we stop letting opponents strip it away?
Tip of the hat to Arman Gungor at Metaspike whose two excellent articles about e-mail timestamp forensics reminded me how much I love this stuff. https://www.metaspike.com/timestamps-forensic-email-examination/