Checking the mailbag, I received a great question from a recent Georgetown E-Discovery Training Academy attendee. I’m posting it here in hopes my response may be useful to you.
My student wrote: I have a question in regard to zipping eDiscovery data. We’ve always used 7zip to zip our collections. The filenames are too long for Microsoft to be happy with them in their original state. One of our consultants is now telling me that I’m changing metadata. Can you clear this up for me? Am I changing metadata just by zipping a file? If I am, are there other simple tools that I can use?
Metadata is always changed in the copying of files within a Windows environment. Anytime you copy data to new media, Windows changes some of its metadata. Some e-discovery collection tools change the values back to the originating values as part of the collection process. Thus, the metadata changes, then changes back to undo the change. If you want to use such tools, they are out there.
I think the more important concern is whether the tools and methods you employ reconstruct the metadata that matters and preserve the integrity of the evidence files. There is a simple way for you to assess that: check the MAC (modified/accessed/created) dates and hash the files in and out! You did some exercises of this nature in my Georgetown Academy workbook.
Prompted by your question, I took a Word document with a last modified date of Wednesday, December 16, 2015, 12:59:56 PM and a created date of Wednesday, December 16, 2015, 1:00:17 PM and hashed it using an online hash tool to get a baseline MD5 hash: 5265ec41f8b30790181a6fd77f094ab3. I ignored the last access date because it’s an inherently unreliable metavalue after Windows 7 stopped routinely updating it. It would change here in any event, even though the file wasn’t opened.
Next, I added the Word file to a 7zip archive and closed 7zip.
Finally, I navigated to the 7zip archive I’d created, opened it in 7zip and extracted the Word file to a new location on my drive. I went to where I landed the data and checked the MAC dates. The last modified date was unchanged, but the creation data properly reflected the fact that I’d “created” a copy in a new location. The operating system populated all the MAC values with the last modified data because, by default, 7zip doesn’t carry forward any temporal data except for the last modified date. The temporal metadata wasn’t ‘changed’ so much as populated with the only temporal information handed off in the transfer–a distinction without a difference to most users.
Nevertheless, when I hashed the Word file I extracted from the 7zip, its MD5 hash value was 5265ec41f8b30790181a6fd77f094ab3, a “perfect” match to the source evidence, accompanied by an unchanged last modified date. 7zip didn’t change the evidence so much as jettison certain file times from its originating environment. The result could be characterized as a misrepresentation of metadata, if the process and its consequences aren’t disclosed to opponents. Most lawyers routinely misinterpret Windows file creation dates, equating them with authoring dates when they may mean authoring or, as commonly, mean the date a file was copied to new media.
Is 7zip good enough for your purposes? I can’t say. If you need to record and replicate all three MAC dates from the originating files, 7zip doesn’t do that by default but it will do so if you add a parameter (see Afterword below). Robocopy or Richcopy support copying with recreated MAC dates, but you’re likely to face the same long file path problem using those tools. So, you might instead try WinZip or WinRAR (enable option to preserve creation date on the latter). I think they will do what you need; but now you know how to test them to be certain.
AFTERWORD: My learned friend and digital forensics colleague, Luciano Humberto, reminds me that there are undocumented command line parameters that can be used in 7zip to force replication of the Created and Last Accessed times. You won’t find these documented in the 7zip help file; but, if you add the parameters tc and ta to the parameters block of the Add to Archive menu, 7zip collects Created and Last Accessed times, too. I tested this and–SHAZAM— the Created and Last Accessed times are collected by 7zip and are restored to the archives data when extracted! Thanks, Luciano.
Jeff Johnson said:
Good article Craig. A note to trainees: if you want to go truly “professional grade” (and ridiculously dirt cheap at the same time), use FTK Imager and perform a folder-level collection…this creates AD1 file(s) that you can password protect, compress, and move around the world with no concerns over MAC date spoliation.
Thanks. I like FTK Imager very much, and the students at the Georgetown E-Discovery Training Academy are all instructed on using FTK Imager to image media and extract data from images and mount them as virtual drives.
Mike Minnick said:
I did not write this to Craig, but I have also used 7-Zip in this way. I, too, like FTK Imager, but FTK Imager’s folder-level collection sounds an awful lot like a basic zip file (can be encrypted, compressed, and moved around without changing dates, but no collection of system metadata or deleted files). Am I missing a benefit to using FTK for a targeted folder-level collection?
To me, the downside for FTK Imager and AD1 format for these purposes is their complexity. 7-Zip is totally free, fast, and offers, I think, the same functionality (can encrypt, compress, preserve dates, and handle long files names/paths). As opposed to FTK Imager, it can be run by just about any custodian with some basic instructions. (Yes this is self-collection, but is it functionally different when the custodian pointed you to the specific folder you are collecting?). The results are in a more user- and processing software-friendly format, saving me the work of re-opening in FTK just to convert. The folder directory path can be preserved in the zip file name.
To get around the issue with date created, I have been using .zip format (which preserved Date Created). Now that I know about the parameters (Thanks Craig and Luciano), I can use .7z format, which I find to be faster and have a better compression ratio.
Brett Shavers said:
I routinely boot the custodian machines to WinFE and use FTK Imager or X-Ways Forensics to copy the files/folders into containers. This method write protects the drives and eliminates the risk of modifying any data. Also gives an absolute forensic collection rather than simple copying from a live machine, which always has a potential of modifying data and metadata.
James Reynolds said:
I’ve tested 7Zip against Microsoft’s built in Zip compiler. In short, stick with 7Zip. Microsoft’s Zip program WILL change your metadata. 7Zip can keep it intact, when used properly.