Checking the mailbag, I received a great question from a recent Georgetown E-Discovery Training Academy attendee. I’m posting it here in hopes my response may be useful to you.
My student wrote: I have a question in regard to zipping eDiscovery data. We’ve always used 7zip to zip our collections. The filenames are too long for Microsoft to be happy with them in their original state. One of our consultants is now telling me that I’m changing metadata. Can you clear this up for me? Am I changing metadata just by zipping a file? If I am, are there other simple tools that I can use?
Metadata is always changed in the copying of files within a Windows environment. Anytime you copy data to new media, Windows changes some of its metadata. Some e-discovery collection tools change the values back to the originating values as part of the collection process. Thus, the metadata changes, then changes back to undo the change. If you want to use such tools, they are out there.
I think the more important concern is whether the tools and methods you employ reconstruct the metadata that matters and preserve the integrity of the evidence files. There is a simple way for you to assess that: check the MAC (modified/accessed/created) dates and hash the files in and out! You did some exercises of this nature in my Georgetown Academy workbook.
Prompted by your question, I took a Word document with a last modified date of Wednesday, December 16, 2015, 12:59:56 PM and a created date of Wednesday, December 16, 2015, 1:00:17 PM and hashed it using an online hash tool to get a baseline MD5 hash: 5265ec41f8b30790181a6fd77f094ab3. I ignored the last access date because it’s an inherently unreliable metavalue after Windows 7 stopped routinely updating it. It would change here in any event, even though the file wasn’t opened.
Next, I added the Word file to a 7zip archive and closed 7zip.
Finally, I navigated to the 7zip archive I’d created, opened it in 7zip and extracted the Word file to a new location on my drive. I went to where I landed the data and checked the MAC dates. The last modified date was unchanged, but the creation data properly reflected the fact that I’d “created” a copy in a new location. The operating system populated all the MAC values with the last modified data because, by default, 7zip doesn’t carry forward any temporal data except for the last modified date. The temporal metadata wasn’t ‘changed’ so much as populated with the only temporal information handed off in the transfer–a distinction without a difference to most users.
Nevertheless, when I hashed the Word file I extracted from the 7zip, its MD5 hash value was 5265ec41f8b30790181a6fd77f094ab3, a “perfect” match to the source evidence, accompanied by an unchanged last modified date. 7zip didn’t change the evidence so much as jettison certain file times from its originating environment. The result could be characterized as a misrepresentation of metadata, if the process and its consequences aren’t disclosed to opponents. Most lawyers routinely misinterpret Windows file creation dates, equating them with authoring dates when they may mean authoring or, as commonly, mean the date a file was copied to new media.
Is 7zip good enough for your purposes? I can’t say. If you need to record and replicate all three MAC dates from the originating files, 7zip doesn’t do that by default but it will do so if you add a parameter (see Afterword below). Robocopy or Richcopy support copying with recreated MAC dates, but you’re likely to face the same long file path problem using those tools. So, you might instead try WinZip or WinRAR (enable option to preserve creation date on the latter). I think they will do what you need; but now you know how to test them to be certain.
AFTERWORD: My learned friend and digital forensics colleague, Luciano Humberto, reminds me that there are undocumented command line parameters that can be used in 7zip to force replication of the Created and Last Accessed times. You won’t find these documented in the 7zip help file; but, if you add the parameters tc and ta to the parameters block of the Add to Archive menu, 7zip collects Created and Last Accessed times, too. I tested this and–SHAZAM— the Created and Last Accessed times are collected by 7zip and are restored to the archives data when extracted! Thanks, Luciano.