I’m surprised how frequently I’m engaged to collect the contents of Gmail accounts in e-discovery, especially when the account is being collected solely for preservation, and there’s no compelling reason to entrust the task to a neutral. I appreciate that hiring an expert offers greater assurance that the task will be approached with skill and experience, as well as that integrity of process can be supported by the testimony of someone unconnected with the client or law firm. But, though collecting and validating the complete contents of a Gmail account can be tricky and tedious, it’s not all that difficult to do. Happily, unless you do something really dumb, it’s unlikely that even a botched Gmail collection effort will harm the contents of the account.
For those seeking a low-cost, defensible mechanism to preserve Gmail content, this (long, dry) post lays out a detailed methodology for collection and preservation of the contents of a Gmail webmail account in the static form of a standard Outlook PST container file. I will address various technical considerations, but few legal ones. Whether or not the methods described in this post are legally sufficient in your case or compliant with Gmail’s terms of service is not my call, and I offer no opinions about same.
[NOTE TO READERS 10/14/14: When I wrote this post, there was not yet a backup capability built into Gmail. Google now makes data tools available that support the creation of a rich archive of a user’s Google content, including, Gmail, Contacts, Calendar and Google Drive. You can find it the Archive section of https://www.google.com/settings/datatools when logged into Google and can read more about it here.]
Credentials and Consent
To download the contents of a Gmail account, you’re going to need both account credentials (user name and password) and express, informed, contemporaneous consent, preferably in writing. Sure, you can get into an account with credentials alone, but absent express, informed, contemporaneous consent granting access for the purpose of collecting the contents of the account, you’re flirting with disaster, or even incarceration. There’s no fooling around on this. Don’t rationalize that consent is implied because you’ve long known the password or that some other noble end justifies the shabby means. Without express, informed, contemporaneous consent, DO NOT access someone else’s Gmail account using their credentials.
A satisfactory written consent doesn’t have to be complex or legalistic, it might say something like:
“I authorize [PERSON COLLECTING] to access my e-mail account(s) using the credentials I supply. I understand that my e-mail and other electronically stored information will be collected from my e-mail account(s) for the purpose of preserving and producing the information in connection with certain claims or litigation, and that the e-mail and information will be furnished to my attorneys. The permission granted herein to access and collect from my account(s) shall expire and be withdrawn on [DATE].”
Tips on Credentials
- Consider having the account holder change their customary Gmail password to a temporary password for the duration of the account collection effort. That way, when collection is complete, the account holder can return to using his or her preferred password without fear that it’s been compromised by disclosing it for purposes of collection.
- Test the credentials promptly to insure they work. Often, I’m furnished credentials that don’t work until we figure out that the uppercase “I” (“eye”) is actually a lowercase “l” (“ell”) or that V (Victor) must have sounded a whole lot like a B (Bravo) when counsel wrote it down.
- Some users configure Gmail for 2-step verification (also called two-factor authentication). This more secure log in method ties Gmail access to specific machines unless an additional code obtained by phone or text messaging is also supplied. In that event, you can have the user generate what Google calls an “application-specific password” and supply it to the person doing the collection. To set up an application-specific password, the user should visit his or her Google Account settings page at https://www.google.com/settings/account, then, on the left, click Security. Under the “2-step verification” topic, the user should click “Manage your application specific passwords,” enter a descriptive name for the password (e.g., IMAP collection), then click “Generate application-specific password.” The 16 character application-specific password generated in this way can then be revoked by the user once collection is complete.
IMAP Collection Tools
Gmail supports downloading of account contents via POP3 and IMAP protocols. Because POP3 is limited with respect to collection from folders beyond the Inbox, I prefer IMAP as my Gmail collection protocol.
There are several low-cost tools well-suited to collection of IMAP messaging, including e.g., Prooffinder and Aid4mail. But I will outline how to do a Gmail collection using Outlook 2010 via IMAP because most Windows users already have a copy of Microsoft Outlook 2010 and the Outlook .PST container format is universally supported by all competent e-discovery service providers and advanced review platforms. All of the capabilities and configuration settings I address respecting Outlook 2010 also are present in Outlook 2007 as well; you just won’t find them all in precisely the same way.
Will Outlook Change the Evidence?
Yes, it will, somewhat. Outlook cannot replicate, feature-for-feature, all of the content and appearance of Gmail. Outlook will not thread messages as conversations in the same manner as Google; but then, who can do that as well as Google? From the standpoint of the integrity of message bodies, header data and attachment integrity, Outlook will do a bang up job preserving the content and features that tend to matter in e-discovery. In short, the things lawyers and judges care about being preserved will be preserved in a properly collected Outlook .PST container file.
Before attempting an IMAP collection, be sure to log into the Gmail account via the web and, in Settings>Forwarding and POP/IMAP, insure that IMAP is enabled (see screenshot below).
Set Up Outlook Accounts
Since you will want Outlook to create a separate .PST container for the contents of each Gmail account, you will need to set up separate Outlook accounts for each account to be collected. To do so in Outlook, go to File>Account Settings and, in the E-mail tab, select “New.”
In the next screen, enter the name of the account custodian from whom you are collecting and that custodian’s Gmail address. Don’t worry about adding passwords here. Check the radio box labeled, “Manually configure server settings or additional server types” and click “Next.” In the next dialogue box, select “Internet E-mail” and “Next.”
In the Internet E-mail Settings dialogue box, the correct name and e-mail address for the Gmail account holder should already be populated. If not, add it. Be sure the Account Type is set to IMAP (not POP3) and set the Incoming mail server to imap.gmail.com and the Outgoing mail server to smtp.gmail.com. In the Logon Information area, add the Gmail account holder’s user name and password. Be sure to clear the check in the box for “Test Account Settings by clicking the Next button.”
Don’t click Next; instead, click the “More Settings” button.
In the More Internet E-mail Settings dialogue box (right), select the “Advanced” tab and set the Incoming and Outgoing Server Port Numbers as follows:
Incoming server (IMAP): 993
Outgoing server (SMTP): 45
Set both encrypted connection settings to SSL
Click “OK” to return to the Internet E-mail Settings dialogue box, and now click “Next.”
With luck, you will be greeted by the “Congratulations!” dialogue box and can now either add the next Gmail account to be collected (Add Another Account) or click “Finish.”
Three Hurdles: Message Bodies, Pictures and Folders
Ideally, having Outlook download the complete contents of a Gmail collection would be a foolproof, set-it-and-forget-it endeavor. My experience with large Gmail collections has never been so. With care, it works; but it takes longer than I like and, if you’re not careful, you may imagine you’ve captured all the content but left a lot behind.
For expediency in reviewing e-mail, Outlook treats message bodies and message headers separately when accessing IMAP accounts. Message headers only hold the dog tag data for the message (i.e., To, From, Date, CC and Subject). Because message headers download rapidly and suffice to distinguish urgent messages from less pressing missives, Outlook and other mail clients initially download just message headers and do not acquire message bodies or attachments until you open or preview the message. On a fast internet connection, the user is largely oblivious to the short delay this entails and benefits from speedier access overall; but, when preserving data for discovery, don’t assume you’ve acquired the entire contents of every message (header, message body and attachments) when all you may have are the headers for some or all of the items in the account.
The difference is significant in terms of risk, but also in terms of time. It takes longer—sometimes days longer—to download the entire contents of a large Gmail collection using Outlook and IMAP versus message headers alone. There are steps you can take to minimize the risk, but there’s not much you can do to shorten the time.
Downloading Message Bodies: You can change Outlook’s default behavior in downloading only message headers to downloading the full contents of messages. To do so, select Send/Receive from the Outlook menu bar and click on Send/Receive Groups, choosing “Define Send/Receive Groups.” My practice is to define separate Send/Receive groups for each separate account (or related group of accounts, e.g., husband and wife) for each matter. If you’re using a never-configured copy of Outlook, doing isn’t essential; however, as I may use Outlook to collect the contents of different accounts in different cases, I find it efficient to create separate Send/Receive groups so I can initiate a collection (or an update) of one account without triggering action in others.
Highlight the Send/Receive group you want to configure, then click “Edit.” In the Send/Receive Settings dialogue box that appears, select the mail account you want to configure and, under the option, “Receive Mail Items,” select “Use the custom behavior defined below.” Under Folder Options, select the folders you want to collect by checking their boxes (be sure to select “All Mail” for a Gmail account) and choose the option, “Download complete item including attachments.” Click OK to save your changes. Close Send/Receive Groups.
In a perfect world, the settings just applied would suffice to retrieve message headers, bodies and attachments; however, my experience is that you must carefully ascertain whether you have all the components of the messages in the account before assuming acquisition is complete. Fortunately, Outlook makes it easy to sort by message header status, segregating headers with contents that have been downloaded from those merely available for download. For the latter items, I’ve ofttimes had to get the job done by selecting these stubborn items and clicking, “Mark to Download” on the ribbon. Until all items show up as downloaded items, you’re not done.
Downloading Embedded Pictures: As a security measure, Outlook is configured by default to not download pictures embedded in HTML message bodies. Accordingly, when collecting mail using Outlook, change the program settings in File>Options>Trust Center>Trust Center Settings to force the download of pictures by unchecking the setting, “Don’t download pictures automatically in HTML e-mail messages or RSS items.”
Subscribing to IMAP Folders: POP3 will typically download only from the Inbox, and even then, may collect incompletely. IMAP can collect from folders beyond the Inbox and can reproduce the folder structure. To insure, that Outlook is both capturing the account folder structure and downloading the content the various folders, it may be necessary to collect and subscribe to the folders in Outlook. To do this, locate the account name in the far left Navigation Pane in Outlook and right click on it. Select IMAP Folders from the menu. Click the Query button to download a list of folders. Check the contents of the “Subscribed” tab to ensure Outlook is subscribed to all of the folders whose contents you wish to collect. If not, select the folder from the “All” tab, highlight the unsubscribed folder and click the “Subscribe” button.
One of Gmail’s great strengths online is also something of an Achilles’ heel when doing Outlook IMAP collections. When you folder a message in Gmail, the message is only virtually added to a folder. But, when you replicate the folder structure using Outlook, every message in every folder is physically duplicated within the All Mail folder, and any message that populates multiple folders in Gmail online is replicated in the various folders offline. The result is that collecting from Gmail takes longer (as identical items are repeatedly download to the replicated folders in which they reside) and produces a much larger Outlook PST offline than the same volume of e-mail online.
So, you face something of a Hobson’s choice when collecting Gmail. If you want the folder structure replicated, it comes at the cost of significant delay and redundancy. To collect more quickly and efficiently from All Mail alone, you lose the folder structure. So far, this unfortunate trade off appears unavoidable.
Message Counts: I like to know how many messages are in each downloaded folder in Outlook; but by default, Outlook displays the total for unread messages. You can change this by right-clicking on each folder and selecting Properties. In the “General” tab of the Properties dialogue box, change the option from “Show number of unread items” to “Show total number of items.”
Disable the Marking of Previewed Items as Read: By default, Outlook is set to show any message that’s previewed for more than 5 seconds as having been read. Since you will want to preserve, as feasible, the read status that reflects the account holder’s actions and not your own, you should disable this feature before examining the contents of the messages. To do so, go to File>Options>Mail>Outlook Panes and click the Reading panes button. In the Reading Pane dialogue box, uncheck the option for “Mark items as read when viewed in Reading Pane.”
Quality Assurance: If all goes well, the Outlook PST file for the account being collected will swell with Gmail content. You can find the location of the PST file by File>Account Settings, then clicking on the Data Files tab of the Account Settings dialogue box. Now, click Open File Location to pull up the folder. When you’re confident the work is done and the PST faithfully reflects the full contents of the account and folders sought to be preserved, shut down Outlook and make an archival copy of the PST to new media. It’s always a good idea to test a working copy of the PST you’ve created to be sure that it can be read by your e-discovery tool of choice.
Two key quality assurance tasks to undertake before wrapping up collection are checking folder message counts and confirming download status. To check message counts, simply ascertain the total number of items in each folder (see Message Counts above) and check each of these against the number of items in each folder in the online Gmail account. Be very careful when logged into the online Gmail account as you can effect permanent changes to the evidence, and you are logged in as the user, potentially prompting others who can view the user’s status to think the user is online. Tiptoe! In a very dynamic Gmail environment, message counts may change rapidly, and you may have to accept some minor variation in the Inbox based on late arrivals. Otherwise, folder message counts should match exactly.
A second crucial quality assurance step is to ascertain that all message bodies have been downloaded and that you have no headers lacking message bodies and attachments. One way to accomplish this is to group the contents of your main Outlook pane by availability as well as add an IMAP status column (which will show whether or not the item has been marked for downloading). If you identify either a group of items not downloaded or a group marked for download, you likely have an incomplete collection.
Joe Treese said:
Good technical guidance, pretty thorough. Some IT guys I know probably don’t know some of what’s covered here.
Some cautions and caveats I’d suggest to the “lightly technical” reader:
1) If the custodian is already using Outlook with Gmail, don’t assume that it’s configured to meet all of the preservation objectives that you face (and which this article provides). Make sure that the settings used by the custodian are not in conflict with those covered here, or any others that would threaten the integrity of the ESI preserved in Outlook. If the custodian is using versions of Outlook other than Outlook 2010, you must validate (and modify, if necessary) the steps outlined here. Get a technician involved if your own skills are not on par with Craig’s.
2) Gmail (and all of the Google products) are evolving rapidly and in many dimensions – for example, integration among the Google products continues to grow, (especially with Google+), and Google regularly “sunsets” applications (Reader fans are still grousing). If you’re reading this article sometime in the post-2013 timeframe, make sure that Gmail’s operation hasn’t changed with respect to the features which affect Gmail-to-Outlook preservation.
3) Some existing Gmail ESI (like Tasks) may need to be preserved in your case, and Outlook has features which may mirror (work identically) or simulate (overlaps, but not identically – like the folder versus label distinction raised in the article). Some others (like Contacts) may require use of other features like Google Takeout or “straight” export-to-CSV files to preserve. Much of the power of Gmail & other Google products is the feature-rich flexibility offered: how it’s implemented by the custodian might require preservation steps in addition to (or different than) those outlined above. (Sherri Harris did a podcast (at http://esibytes.com/effectively-interviewing-custodians-to-find-esi/) a while ago on ESIBytes that is an excellent “how-to” and relevant to the topic).
4) To quote Steve Clark’s comment on an earlier post on this blog,
“ALWAYS Document your steps, whether there are several or hundreds.”
A simple checklist mirroring the steps outlined (and customized to your needs, as applicable) isn’t just good documentation – it provides a ready way to capture and implement any changes needed during the actual collection process, which creative custodians will no doubt present.
Those are extremely valuable caveats! Thanks for sharing them. I especially echo the one about possible evolution of the products. Future reader may have a one-click preservation option available (“export to PST”) or will be preserving more simply into the cloud instead of gathering data into local container files. What I detail here is not the only way or even the best way, it’s just one way.
Pingback: “COLLECTING GMAIL FOR PRESERVATION” | ESIdence
Pingback: This Week's Links (weekly) | The Many Faces of Mike McBride
Kathy Williams, assistant said:
Excellent article. I have a question regarding integrating this advice into a more comprehensive e-discovery approach for small firms. In 2009, you wrote a column, “E-Discovery for Everybody:The EDna Challenge” in which you recommended some other low-cost solutions. I’m glad to see that you still recommend Aid4Mail and wonder if dtSearch and Karen’s Hasher are still available and still the best to meet the EDna Challenge. As we are facing end-of-the-year decisions about purchasing software, we would like to know if you have similar recommendations for de-duping, hashing and predictive coding and other technology assisted review tools. At the end of the 2009 column, you “threw down the gauntlet” to the e-discovery industry. Did any rise to the challenge?
Thanks, Kathy. Yes, Aid4Mail is still available from Fookes Software and still a wonderful tool (albeit a tad more costly than it used to be–but what isn’t?). Karen’s Hasher is also still around (http://www.karenware.com/powertools/pthasher.asp) though sadly its contributor, Karen Kenworthy, died much too young in 2011. dtSearch is still very strong and still very reasonably priced. Collectively, I’d guess you’re talking about $300/seat for all three utilities, so still very nominal.
But as good as all these tools are, the landscape since my EDna Challenge has been most changed by the advent of Prooffinder (Prooffinder.com), an amazing tool that costs just $100 for an annual license, with all proceeds going to child literacy. Prooffinder does essentially all the e-discovery tasks that can be accomplished by the tools mentioned above (e-mail parsing, de-duping, hashing and indexing with powerful search features and advanced analytics). It is limited to just 15GB of data per matter, but you can open as many matters as you wish. For the small case, Prooffinder is the ultimate EDna challenger, and I can’t praise it enough.
As you move into higher price points, you might consider fairly-priced programs like Digital Warroom from GGO or Intella from Vound Software. When you have the need for speed and maximum capability, Nuix is the power tool of choice; but, it’s far beyond poor EDna’s reach.
Kathy Williams, assistant said:
Thank you, that’s exactly what I needed to know. I had seen your praise earlier of Prooffinder, but I don’t believe I appreciated its purpose and efficiency.
William Kellermann said:
Watch out for the dreaded Gmail Guillotine (account lock) – https://support.google.com/mail/answer/43692?hl=en
“If we detect abnormal usage that may indicate that your account has been compromised, we may temporarily disable access. It will take between one minute and 24 hours for access to be reinstated, depending on the behavior detected by our system.
Unusual account activity includes, but is not limited to:
Receiving, deleting, or downloading large amounts of mail via POP or IMAP in a short period of time. If you’re getting the error message, ‘Lockdown in Sector 4,’ you should be able to access Gmail again after waiting 24 hours…”
That a great heads up, Bill. Thanks. I’ve yet to trigger a Sector 4 alert in collections, but good to know about it should one pop up.
Great article. I use this method pretty routinely when I just need to collect a single email box. I would recommend just one small edit being set the Outgoing mailserver to localhost (or another non-existing server).
bill seymour said:
An excellent article – refreshing to come across a professional discussion of a subject.
I am interested in having a searchable backup copy of my business Google Apps gmail account, and I am wrestling with the Gmail ‘labels’ issue you described in your article. I was wondering how you come out on ‘all mail, but no folders’ vs ‘folders mapped from labels, but redundancy’ – I don’t think your article expressed a preferred option.
For my situation (probably using MailStore Home, which appears to be a great tool for storing/searching/using backed up email), I think ‘all mail but no folders’ will work, because I can search the backed up emails.
Thanks for a very useful article.
bill seymour said:
Addendum – I see that MailStore Home lets one target a list of specific labels in Gmail to download and backup, and the result is a folder structure (with the label being handled as a folder/subfolder). So apparently this is a fairly simple ‘mapping’ capability, and I assume would produce the ’email in multiple folders’ result if a Gmail email has multiple labels.
Dear Mr. Seymour:
When I wrote the post, there was not yet a backup capability built into GMail (or at least I was not yet aware of same). Now, Google makes data tools available that support the creation of a rich archive of a user’s Google content, including, Gmail, Contacts, Calendar, Google Drive, etc. It can be found in the Archive section of https://www.google.com/settings/datatools when you are logged into Google. As to whether to preserve labels, which are tantamount to folder structuring, it depends upon whether they may bear materially on the issues reasonably anticipated in litigation. I use them, and so would feel obliged to preserve them notwithstanding the redundancy occasioned in the archive. Such redundancy is not a problem in the tools I use for analysis and review because I can readily deduplicate the items.
If I failed to construct the archive with labels, all is not lost. When you export your mail from Gmail, each message’s labels are preserved in a special X-Gmail-Labels header, in CSV format. Per Google, “While no mail client recognizes this header now, most mail clients allow for extensions to be written that could make use of this data.” I expect my analytical tools also would read the data and allow me to search for same. Put simply, the labeling data will be in the messages; so, no harm, no foul.
bill seymour said:
Thank you for the further explanation. The Gmail labels being preserved in that special header is very interesting.
Pingback: Preserving Gmail for Dummies | Ball in your Court