keep-calm-and-do-it-yourselfThis is the thirteenth in a series revisiting Ball in Your Court columns and posts from the primordial past of e-discovery–updating and critiquing in places, and hopefully restarting a few conversations.  As always, your comments are gratefully solicited.

Do-It-Yourself Digital Discovery

[Originally published in Law Technology News, May 2006]

Recently, a West Texas firm received a dozen Microsoft Outlook PST files from a client.  Like the dog that caught the car, they weren’t sure what to do next.  Even out on the prairie, they’d heard of online hosting and e-mail analytics, but worried about the cost.  They wondered: Did they really need an e-discovery vendor?  Couldn’t they just do it themselves?

As a computer forensic examiner, I blanch at the thought of lawyers harvesting data and processing e-mail in native formats.  “Guard the chain of custody,” I want to warn.  “Don’t mess up the metadata!  Leave this stuff to the experts!”  But the trial lawyer in me wonders how a solo/small firm practitioner in a run-of-the-mill case is supposed to tell a client, “Sorry, the courts are closed to you because you can’t afford e-discovery experts.”

Most evidence today is electronic, so curtailing discovery of electronic evidence isn’t an option, and trying to stick with paper is a dead end.  We’ve got to deal with electronic evidence in small cases, too.  Sometimes, that means doing it yourself.

The West Texas lawyers sought a way to access and search the Outlook e-mail and attachments in the PSTs.  It had to be quick and easy.  It had to protect the integrity of the evidence.  And it had to be cheap.  They wanted what many lawyers will come to see they need: the tools and techniques to stay in touch with the evidence in smaller cases without working through vendors and experts.

What’s a PST?
Microsoft Outlook is the most popular business e-mail and calendaring client, but don’t confuse Outlook with Outlook Express, a simpler application bundled with Windows.  Outlook Express stores messages in plain text, by folder name, in files with the extension .DBX.  Outlook stores local message data, attachments, folder structure and other information in an encrypted, often-massive database file with the extension .PST.  Because the PST file structure is complex, proprietary and poorly documented, some programs have trouble interpreting PSTs.

What about Outlook?
Couldn’t they just load the files in Outlook and search?  Many do just that, but there are compelling reasons why Outlook is the wrong choice for an electronic discovery search and review tool, foremost among them being that it doesn’t protect the integrity of the evidence.  Outlook changes PST files.  Further, Outlook searches are slow, don’t include attachments (but see my concluding comments below) and can’t be run across multiple mail accounts.  I considered Google Desktop–the free, fast and powerful keyword search tool that makes short work of searching files, e-mail and attachments–but it has limited Boolean search capabilities and doesn’t limit searches to specific PSTs.

I also considered several extraction and search tools, trying to keep the cost under $200.00.  One, a gem called Paraben E-Mail Examiner ($479.00), sometimes gets indigestion from PST files and won’t search attachments.  Another favorite, Aid4Mail Professional from Fookes Software ($1,499.00), quickly extracts e-mail and attachments and outputs them to several production formats, but Aid4Mail has a meager search capability and doesn’t support the 64-bit version of Outlook.  I looked at askSam software ($149.95), but after studying its FAQ and noodling with a demo, askSam proved unable to access any PST except the default profile on the machine—potentially commingling evidence e-mail and the lawyer’s own e-mail.

The answer (a decade ago) lay with dtSearch Desktop, a $199.00 indexed search application offering a command line tool that extracts the contents of PST files as generic message files (.MSG) indexed by dtSearch.  In testing, once I got past the clunky command line syntax, I saved each custodian’s mail to separate folders and then had dtSearch index the folders.  The interface was wonderfully simple and powerful.  Once you select the indices, you can use nearly any combination of Boolean, proximity, fuzzy or synonym searches.  Search results are instantaneous and essential metadata for messages and attachments are preserved and presented.  It even lets you preview attachments.

dtSearch (still) lacks key features seen in products designed as e-discovery review tools, like the ability to tag hot documents, de-duplicate and redact privileged content.  But you can copy selected messages and attachments to folders for production or redaction, preserving folder structures as desired.  You can also generate printable search reports showing search results in context.  In short, dtSearch works, but as a do-it-yourself e-mail tool, it’s best suited to low volume/low budget review efforts (and today, I wouldn’t use it for those).

Wave of the Future?
Any firm handles a fifty-page photocopy job in-house, but a fifty thousand-page job is going out to a copy shop.  Likewise, e-discovery service providers are essential in bigger cases, but in matters with tight budgets or where the evidence is just e-mail from a handful of custodians, lawyers may need to roll up their sleeves and do it themselves.

SIDEBAR: Tips for Doing It Yourself
If you’d like to try your hand, dtSearch offers a free 30-day demonstration copy at  Practice on your own e-mail or an old machine before tackling real evidence, and if you anticipate the need for computer forensics, leave the evidence machines alone and bring in an expert.

Whether e-mail is stored locally as a PST, in a similar format called an OST or remotely on an Exchange server depends on the sophistication and configuration of the e-mail system.  To find a local PST file on a machine running Windows XP, NT or 2000, look for C:\Documents and Settings\Windows user name\Local Settings\Application Data\Microsoft\Outlook\Outlook.pst (today, the path to Outlook mail would likely be C:\Users\Windows user name\Documents\Outlook Files\Outlook.pst).  Archived e-mail resides in another file typically found in the same directory, called Archive.pst.  Occasionally, users change default filenames or locations, so you may want to use Windows Search to find all files with a PST extension. 

When you locate the PST files, record their metadata; that is, write down the filenames, where you found them, file sizes, and dates they were created, modified and last accessed (right click on the file and select Properties if you don’t see this information in the directory).  Be sure Outlook’s not running and copy the PST files to read-only media like CD-R or DVD-R.  Remember that PSTs for different custodians tend to have the same names (i.e., Outlook.pst and Archive.pst), so use a naming protocol or folder structure to keep track of who’s who.  When dealing with Outlook Express, search for messages stored in archives with a DBX extension.

Though dtSearch will index DBX files, PSTs must first be converted to individual messages using the included command line tool, mapitool.exe.  For DOS veterans, it’s old hat, but those new to command line syntax may find it confusing.  To use mapitool, you’ll need to know the paths to mapitool.exe and to the PSTs you’re converting.  Then, open a command line window (Start>Run>Command), and follow the instructions included with mapitool.

When mapitool completes the conversion, point the dtSearch Index Manager to the folder holding the extracted messages and index its contents.  Name the index to correspond with the custodian and repeat the process for each custodian’s PST files.


Today, I’d be unlikely to deploy DT Search in e-discovery, though it remains an excellent program at an excellent price, the enduring lack of tagging and deduplication capabilities make it a non-starter today.  In the nearly ten years since I wrote this, several full-featured, low-cost e-discovery tools have emerged–tools that not only handle e-mail natively but incorporate capabilities purpose-built for e-discovery.  As a value proposition, the leader of the pack is Nuix’ Prooffinder, a $100 marvel from a company dedicating all proceeds of sale to a charity supporting children’s literacy.  So long as the aggregate data volume processed per Prooffinder “case” doesn’t exceed 15GB, Prooffinder will make quick work of Outlook PST or OST files and even an IBM Notes NSF file. Attachments are indexed for search, and Prooffinder offers fantastically robust search capabilities.  You can tag, preview and export messages as PST container files, PDF images or as near-native single messages (MSGs and EMLS).

Another splendid and more feature rich option is Intella from Vound Software.  Intella is priced in versions geared to the size of your data and in all-you-can-ingest versions. Where Prooffinder can only export a load file formatted as a CSV, Intella can create load files in a range of common formats and can ingest load files from imaged productions. Intella also offers simple visual analytics to assist in understanding your dataset.  A third moderately-priced option is Digital Warroom Pro from GGO, which runs about $1,100 once you secure all you need to make it work.  It can be a challenge to install DWRP and it employs its own peculiar terminology; still, once you get the hang of DWRP, it’s a solid performer for the price.  

At LegalTech New York last week, I came across what may be a fourth contender for the title of moderately-priced e-mail e-discovery tool.  Though I haven’t put it through its paces, MailXaminer from Systools ($1,600) strikes me as a capable end-to-end tool for indexing, searching, tagging and producing e-mail evidence. 

My expectation that lawyers learn to undertake basic e-discovery tasks remains one of the most contentious and pointedly polarizing positions I’ve taken.  It never fails to incense lawyers to suggest they could–and should–acquire more hands-on e-discovery skills. Neither does it endear me to service providers and others who profit from doing that which lawyers resolutely refuse to do.   Clearly, mine remains a minority view; but then, it was once thought beneath an attorney to talk on the telephone or touch a keyboard.  Mark my words: someday, solo and small firm lawyers will use tools like those and get their hands dirty with data.  Good technology gets out of the way, and doesn’t get between you and the evidence.