I posted here a year ago laying out a detailed methodology for collection and preservation of the contents of a Gmail account in the static form of a standard Outlook PST. Try as I might to make it foolproof, downloading Gmail using IMAP and Outlook is tricky. Happily since my post, the geniuses at Google introduced a truly simple, no-cost way to collect Gmail and other Google content for preservation and portability. It sets a top flight example for online service providers, and presages how we may use the speed, power and flexibility of Google search as a culling mechanism before exporting for e-discovery.
I’m excited about this because, like millions, I’ve depended on Google apps such as Gmail and Google Calendar for as long as they’ve been around. As an expert witness, I collect and produce messages and attachments in response to subpoenae duces tecum. Gmail made it easy to find responsive content, but hard to get that content out in forms that preserved utility and integrity. In the past, I’d printed the items to searchable PDFs; but, printing to PDF is tedious and runs counter to my penchant for functional and complete forms.
Lately, I’ve taken to using the IMAP protocol to download Gmail to Outlook, creating .PST container files and processing these with e-discovery tools. Getting a complete, compact PST is no picnic. It can take days to grab all message headers, message bodies and attachments in a big collection, and the level of replication is appalling because, when they are downloaded, foldered (i.e., labeled) messages generate duplicate messages and attachments for each label applied. The upshot is that anything in, say, the Inbox or Sent Mail folders also shows up in the All Mail group. This is a convenience online; but, radically increases the collection time and volume to pull the data out with IMAP. Message threading is also a casualty when converting Gmail to Outlook content.
Even if you’re a lawyer who could care less about IMAP, this is a development worth cheering because until now, you had two choices when it came to putting Gmail on legal hold: Either you’d instruct your client not to delete anything (and cross your fingers they’d comply) or you had to hire someone to download the data. Now, Google does the Gmail collection gratis and puts it in a standard MBOX container format that can be downloaded and sequestered. Google even incorporates custom metadata values that reflect labeling and threading. You won’t see these unique metadata tags if you pull the messages into an e-mail client; but, e-discovery software will pick them up. I tested this using Nuix and the $100 marvel, Prooffinder. Both parsed the Gmail metadata handily, enabling the messages to be threaded and paired with their Gmail labels.
MBOX might not have been everyone’s choice for a Gmail container file; but, it’s inspired. MBOX stores the messages in their original Internet message format called RFC 2822 (now RFC 5322). Regular readers may recognize that I’ve been a vocal proponent of this format as a superior form for e-discovery preservation and production. I had no hand in Google’s decision; but, it’s nice to have Google on my side!
So, let me introduce you to Google Data Tools.
The only hard part of archiving Gmail is navigating to the right page. You get there from the Google Account Setting page by selecting “Data Tools” and looking for the “Download your Data” option on the lower right. When you click on “Create New Archive,” you’ll see a menu like that below where you choose whether to download all mail or just items bearing the labels you select.
The ability to label content within Gmail and archive only messages bearing those labels means that Gmail’s powerful search capabilities can be used to identify and label potentially responsive messages, obviating the need to archive everything. It’s not a workflow suited to every case; yet, it’s a promising capability for keeping costs down in the majority of cases involving just a handful of custodians with Gmail.
A lot of discoverable data is moving to Google–to Gmail, Drive, Calendar, YouTube–you name it Kudos to Google for turning a task that’s been hard into something so simple anyone can do it well. That it costs nothing at all, what more can I say? Thank you, Google!
Pingback: Preserving Gmail for Dummies | @ComplexD
Forensicron - Fraudeonderzoek said:
Thanks for bringing this to our attention!
Pingback: Collecting Gmail for Preservation | Ball in your Court
CAtkins Support said:
Reblogged this on CAtkins Support and commented:
MKT Law said:
Reblogged this on A Litigator's Blog | MKT Law, PLC and commented:
Awesome news! On the path to no reason not to request ESI in smaller cases or even every case!
Lars Schou said:
What’s the largest mailbox you tested? We recently tried Google Takeout on mailboxes over 10 GB and, after running for 24+ hours, the exports failed repeatedly.
The largest single mailbox was 19.5GB. I’ve had little difficulty with the creation of the archive but–weirdly–I found that Chrome often failed to complete the download where IE had no issues at all writing the temp file to a zip file. Try another browser for the download–not Chrome.
Lars Schou said:
I’ve also had problems with Chrome on large downloads (and ‘clean’ files being falsely quarantined as malicious), but in this case the archive creation failed server-side. We’ll re-test in case the service has been improved since we first tried back in April. Interestingly, the Google Apps admin for our client had never heard of the service and was certain that he hadn’t enabled it, so Google may have had enabled it by default. It seems to be a service only power users know about, but it clearly poses quite a serious data exfiltration risk.
Joshua Rubin said:
Haven’t tried the GMail download, but I carry a droid and I got quite an eyeful from this interactive map: https://maps.google.com/locationhistory/ Pick a date and press the Go arrow. Wondering if it’s just droids or iPhones too?