What is Native Production for E-Mail?

02 Tuesday Jul 2013

Recently, I’ve weighed in on disputes where the parties were fighting over whether the e-mail production was sufficiently “native” to comply with the court’s orders to produce natively. In one matter, the question was whether Gmail could be produced in a native format, and in another, the parties were at odds about what forms are native to Microsoft Exchange e-mail. In each instance, I saw two answers; the technically correct one and the helpful one.

I am a vocal proponent of native production for e-discovery. Native is complete. Native is functional. Native is inherently searchable. Native costs less. I’ve explored these advantages in other writings and will spare you that here. But when I speak of “native” production in the context of databases, I am using a generic catchall term to describe electronic forms with superior functionality and completeness, notwithstanding the common need in e-discovery to produce less than all of a collection of ESI.

It’s a Database

When we deal with e-mail in e-discovery, we are usually dealing with database content. Microsoft Exchange, an e-mail server application, is a database. Microsoft Outlook, an e-mail client application, is a database. Gmail, a SaaS webmail application, is a database. Lotus Domino, Lotus Notes, Yahoo! Mail, Hotmail and Novell GroupWise—they’re all databases. It’s important to understand this at the outset because if you think of e-mail as a collection of discrete objects (like paper letters in a manila folder), you’re going to have trouble understanding why defining the “native” form of production for e-mail isn’t as simple as many imagine.

Native in Transit: Text per a Protocol

E-mail is one of the oldest computer networking applications. Before people were sharing printers, and long before the internet was a household word, people were sending e-mail across networks. That early e-mail was plain text, also called ASCII text or 7-bit (because you need just seven bits of data, one less than a byte, to represent each ASCII character). In those days, there were no attachments, no pictures, not even simple enhancements like bold, italic or underline.

Early e-mail was something of a free-for-all, implemented differently by different systems. So the fledgling internet community circulated proposals seeking a standard. They stuck with plain text in order that older messaging systems could talk to newer systems. These proposals were called Requests for Comment or RFCs, and they came into widespread use as much by convention as by adoption (the internet being a largely anarchic realm). The RFCs lay out the form an e-mail should adhere to in order to be compatible with e-mail systems.

The RFCs concerning e-mail have gone through several major revisions since the first one circulated in 1973. The latest protocol revision is called RFC 5322 (2008), which made obsolete RFC 2822 (2001) and its predecessor, RFC 822 (1982). Another series of RFCs (RFC 2045-47, RFC 4288-89 and RFC 2049), collectively called Multipurpose Internet Mail Extensions or MIME, address ways to graft text enhancements, foreign language character sets and multimedia content onto plain text emails. These RFCs establish the form of the billions upon billions of e-mail messages that cross the internet.

So, if you asked me to state the native form of an e-mail as it traversed the Internet between mail servers, I’d likely answer, “plain text (7-bit ASCII) adhering to RFC 5322 and MIME.” In my experience, this is the same as saying “.EML format;” and, it can be functionally the same as the MHT format, but only if the content of each message adheres strictly to the RFC and MIME protocols listed above. You can even change the file extension of a properly formatted message from EML to MHT and back in order to open the file in a browser or in a mail client like Outlook 2010. Try it. If you want to see what the native “plain text in transit” format looks like, change the extension from .EML to .TXT and open the file in Windows Notepad.

The appealing feature of producing e-mail in exactly the same format in which the message traversed the internet is that it’s a form that holds the entire content of the message (header, message bodies and encoded attachments), and it’s a form that’s about as compatible as it gets in the e-mail universe. [1]

Unfortunately, the form of an e-mail in transit is often incomplete in terms of metadata it acquires upon receipt that may have probative or practical value; and the format in transit isn’t native to the most commonly-used e-mail server and client applications, like Microsoft Exchange and Outlook. It’s from these applications–these databases–that e-mail is collected in e-discovery.

Outlook and Exchange

Microsoft Outlook and Microsoft Exchange are database applications that talk to each other using a protocol (machine language) called MAPI, for Messaging Application Programming Interface. Microsoft Exchange is an e-mail server application that supports functions like contact management, calendaring, to do lists and other productivity tools. Microsoft Outlook is an e-mail client application that accesses the contents of a user’s account on the Exchange Server and may synchronize such content with local (i.e., retained by the user) container files supporting offline operation. If you can read your Outlook e-mail without a network connection, you have a local storage file.

Practice Tip (and Pet Peeve): When your client or company runs Exchange Server and someone asks what kind of e-mail system your client or company uses, please don’t say “Outlook.” That’s like saying “iPhone” when asked what cell carrier you use. Outlook can serve as a front-end client to Microsoft Exchange, Lotus Domino and most webmail services; so saying “Outlook” just makes you appear out of your depth (assuming you are someone who’s supposed to know something about the evidence in the case).

Outlook: The native format for data stored locally by Outlook is a file or files with the extension PST or OST. Henceforth, I’m going to speak only of PSTs, but know that either variant may be seen. PSTs are container files. They hold collections of e-mail—typically stored in multiple folders—as well as content supporting other Outlook features. The native PST found locally on the hard drive of a custodian’s machine will hold all of the Outlook content that the custodian can see when not connected to the e-mail server.

Because Outlook is a database application designed for managing messaging, it goes well beyond simply receiving messages and displaying their content. Outlook begins by taking messages apart and using the constituent information to populate various fields in a database. What we see as an e-mail message using Outlook is actually a report queried from a database. The native form of Outlook e-mail carries these fields and adds metadata not present in the transiting message. The added metadata fields include such information as the name of the folder in which the e-mail resides, whether the e-mail was read or flagged and its date and time of receipt. Moreover, because Outlook is designed to “speak” directly to Exchange using their own MAPI protocol, messages between Exchange and Outlook carry MAPI metadata not present in the “generic” RFC 5322 messaging. Whether this MAPI metadata is superfluous or invaluable depends upon what questions may arise concerning the provenance and integrity of the message. Most of the time, you won’t miss it. Now and then, you’ll be lost without it.

Because Microsoft Outlook is so widely used, its PST file format is widely supported by applications designed to view, process and search e-mail. Moreover, the complex structure of a PST is so well understood that many commercial applications can parse PSTs into single message formats or assemble single messages into PSTs. Accordingly, it’s feasible to produce responsive messaging in a PST format while excluding messages that are non-responsive or privileged. It’s also feasible to construct a production PST without calendar content, contacts, to do lists and the like. You’d be hard pressed to find a better form of production for Exchange/Outlook messaging. Here, I’m defining “better” in terms of completeness and functionality, not compatibility with your ESI review tools.

MSGs: There’s little room for debate that the PST or OST container files are the native forms of data storage and interchange for a collection of messages (and other content) from Microsoft Outlook. But is there a native format for individual messages from Outlook, like the RFC 5322 format discussed above? The answer isn’t clear cut. On the one hand, if you were to drag a single message from Outlook to your Windows desktop, Outlook would create that message in its proprietary MSG format. The MSG format holds the complete content of its RFC 5322 cousin plus additional metadata; but it lacks information (like foldering data) that’s contained within a PST. It’s not “native” in the sense that it’s not a format that Outlook uses day-to-day; but it’s an export format that holds more message metadata unique to Outlook. All we can say is that the MSG file is a highly compatible near-native format for individual Outlook messages–more complete than the transiting e-mail and less complete than the native PST. Though it’s encoded in a proprietary Microsoft format (i.e., it’s not plain text), the MSG format is so ubiquitous that, like PSTs, many applications support it as a standard format for moving messages between applications.

Exchange: The native format for data housed in an Exchange server is its database file, prosaically called the Exchange Database and sporting the file extension .EDB. The EDB holds the account content for everyone in the mail domain; so unless the case is the exceedingly rare one that warrants production of all the e-mail, attachments, contacts and calendars for every user, no litigant hands over their EDB.

It may be possible to create an EDB that contains only messaging from selected custodians (and excludes privileged and non-responsive content) such that you could really, truly produce in a native form. But, I’ve never seen it done that way, and I can’t think of anything to commend it over simpler approaches.

So, if you’re not going to produce in the “true” native format of EDB, the desirable alternatives left to you are properly called “near-native,” meaning that they preserve the requisite content and essential functionality of the native form, but aren’t the native form. If an alternate form doesn’t preserve content and functionality, you can call it whatever you want. I lean toward “garbage,” but to each his own.

E-mail is a species of ESI that doesn’t suffer as mightily as, say, Word documents or Excel spreadsheets when produced in non-native forms. If one were meticulous in their text extraction, exacting in their metadata collection and careful in their load file construction, one could produce Exchange content in a way that’s sufficiently complete and utile as to make a departure from the native less problematic—assuming, of course, that one produces the attachments in their native forms. That’s a lot of “ifs,” and what will emerge is sure to be incompatible with e-mail client applications and native review tools.

Litmus Test: Perhaps we have the makings of a litmus test to distinguish functional near-native forms from dysfunctional forms like TIFF images and load files: Can the form produced be imported into common e-mail client or server applications?

You have to admire the simplicity of such a test. If the e-mail produced is so distorted that not even e-mail programs can recognize it as e-mail, that’s a fair and objective indication that the form of production has strayed too far from its native origins.

Gmail

The question whether it’s feasible to produce Gmail in its native form triggered an order by U.S. Magistrate Judge Mark J. Dinsmore in a case styled, Keaton v. Hannum, 2013 U.S. Dist. LEXIS 60519 (S.D. Ind. Apr. 29, 2013). It’s a seamy, sad suit brought pro se by an attorney named Keaton against both his ex-girlfriend, Christine Zook, and the cops who arrested Keaton for stalking Zook. It got my attention because the court cited a blog post I made three years ago. [2] The Court wrote:

Zook has argued that she cannot produce her Gmail files in a .pst format because no native format exists for Gmail (i.e., Google) email accounts. The Court finds this to be incorrect based on Exhibit 2 provided by Zook in her Opposition Brief. [Dkt. 92 at Ex. 2 (Ball, Craig: Latin: To Bring With You Under Penalty of Punishment, EDD Update (Apr. 17, 2010)).] Exhibit 2 explains that, although Gmail does not support a “Save As” feature to generate a single message format or PST, the messages can be downloaded to Outlook and saved as .eml or.msg files, or, as the author did, generate a PDF Portfolio – “a collection of multiple files in varying format that are housed in a single, viewable and searchable container.” [Id.] In fact, Zook has already compiled most of her archived Gmail emails between her and Keaton in a .pst format when Victim.pst was created. It is not impossible to create a “native” file for Gmail emails.

Id. at 3.

I’m gratified when a court cites my work, and here, I’m especially pleased that the Court took an enlightened approach to “native” forms in the context of e-mail discovery. Of course, one strictly defining “native” to exclude near-native forms might be aghast at the loose lingo; but the more important takeaway from the decision is the need to strive for the most functional and complete forms when true native is out-of-reach or impractical.

Gmail is a giant database in a Google data center someplace (or in many places). I’m sure I don’t know what the native file format for cloud-based Gmail might be. Mere mortals don’t get to peek at the guts of Google. But, I’m also sure that it doesn’t matter, because even if I could name the native file format, I couldn’t obtain that format, nor could I faithfully replicate its functionality locally.[3]

Since I can’t get “true” native, how can I otherwise mirror the completeness and functionality of native Gmail? After all, a litigant doesn’t seek native forms for grins. A litigant seeks native forms to secure the unique benefits native brings, principally functionality and completeness.

There are a range of options for preserving a substantial measure of the functionality and completeness of Gmail. One would be to produce in Gmail.

HUH?!?!

Yes, you could conceivably open a fresh Gmail account for production, populate it with responsive messages and turn over the access credentials for same to the requesting party. That’s probably as close to true native as you can get (though some metadata will change), and it flawlessly mirrors the functionality of the source. Still, it’s not what most people expect or want. It’s certainly not a form they can pull into their favorite e-discovery review tool.

Alternatively, as the Court noted in Keaton v. Hannum, an IMAP[4] capture to a PST format (using Microsoft Outlook or a collection tool) is a practical alternative. The resultant PST won’t look or work exactly like Gmail (i.e., messages won’t thread in the same way and flagging will be different); but it will supply a large measure of the functionality and completeness of the Gmail source. Plus, it’s a form that lends itself to many downstream processing options.

So, What’s the native form of that e-mail?

Which answer do you want; the technically correct one or the helpful one? No one is a bigger proponent of native production than I am; but I’m finding that litigants can get so caught up in the quest for native that they lose sight of what truly matters.

Where e-mail is concerned, we should be less captivated by the term “native” and more concerned with specifying the actual form or forms that are best suited to supporting what we need and want to do with the data. That means understanding the differences between the forms (e.g., what information they convey and their compatibility with review tools), not just demanding native like it’s a brand name.

When I seek “native” for a Word document or an Excel spreadsheet, it’s because I recognize that the entire native file—and only the native file—supports the level of completeness and functionality I need, a level that can’t be fairly replicated in any other form. But when I seek native production of e-mail, I don’t expect to receive the entire “true” native file. I understand that responsive and privileged messages must be segregated from the broader collection and that there are a variety of near native forms in which the responsive subset can be produced so as to closely mirror the completeness and functionality of the source.

When it comes to e-mail, what matters most is getting all the important information within and about the message in a fielded form that doesn’t completely destroy its character as an e-mail message.

So let’s not get too literal about native forms when it comes to e-mail. Don’t seek native to prove a point. Seek native to prove your case.

____________

Postscript: When I publish an article extolling the virtues of native production, I usually get a comment or two saying, “TIFF and load files are good enough.” I can’t always tell if the commentator means “good enough to fairly serve the legitimate needs of the case” or “good enough for those sleazy bastards on the other side.” I suspect they mean both. Either way, it might surprise readers to know that, when it comes to e-mail, I agree with the first assessment…with a few provisos.

First, TIFF and load file productions can be good enough for production of e-mail if no one minds paying more than necessary. It generally costs more to extract text and convert messages to images than it does to leave it in a native or near-native form. But that’s only part of the extra expense. TIFF images of messages are MUCH larger files than their native or near native counterparts. With so many service providers charging for ingestion, processing, hosting and storage of ESI on a per-gigabyte basis, those bigger files continue to chew away at both side’s bottom lines, month-after-month.

Second, TIFF and load file productions are good enough for those who only have tools to review TIFF and load file productions. There’s no point in giving light bulbs to those without electricity. On the other hand, just because you don’t pay your light bill, must I sit in the dark?

Third, because e-mails and attachments have the unique ability to be encoded entirely in plain text, a load file can carry the complete contents of a message and its contents as RFC 5322-compliant text accompanied by MAPI metadata fields. It’s one of the few instances where it’s possible to furnish a load file that simply and genuinely compensates for most of the shortcomings of TIFF productions. Yet, it’s not done.

Finally, TIFF and load file productions are good enough for requesting parties who just don’t care. A lot of requesting parties fall into that category, and they’re not looking to change. They just want to get the e-mail, and they don’t give a flip about cost, completeness, utility, metadata, efficiency, authentication or any of the rest. If both sides and the court are content not to care, TIFF and load files really are good enough.

[1] There’s even an established format for storing multiple RFC 5322 messages in a container format called mbox. The mbox format was described in 2005 in RFC 4155, and though it reflects a simple, reliable way to group e-mails in a sequence for storage, it lacks the innate ability to memorialize mail features we now take for granted, like message foldering. A common workaround is to create a single mbox file named to correspond to each folder whose contents it holds (e.g., Inbox.mbox)

[2] With a tip of the hat to Josh Gilliland, the blogger behind Bow Tie Law, who brought the Keaton decision to my attention.

[3] It was once possible to create complete, offline replications of Gmail using a technology called Gears; however, Google discontinued support of Gears some time ago. Gears’ successor, called “Gmail Offline for Chrome,” limits its offline collection to just a month’s worth of Gmail, making it a complete non-starter for e-discovery. Moreover, neither of these approaches employs true native forms as each was designed to support a different computing environment.

[4] IMAP (for Internet Message Access Protocol) is another way that e-mail client and server applications can talk to one another. The latest version of IMAP is described in RFC 3501. IMAP is not a form of e-mail storage; it is a means by which the structure (i.e., foldering) of webmail collections can be replicated in local mail client applications like Microsoft Outlook. Another way that mail clients communicate with mail servers is the Post Office Protocol or POP; however, POP is limited in important ways, including in its inability to collect messages stored outside a user’s Inbox. Further, POP does not replicate foldering. Outlook “talks” to Exchange servers using MAPI and to other servers and webmail services using MAPI (or via POP, if MAPI is not supported).

22 thoughts on “What is Native Production for E-Mail?”

Pingback: What is Native Production for E-Mail? | @ComplexD
David Tobin said:

July 3, 2013 at 8:09 AM

nice post

LikeLike
Sandy Serkes said:

July 3, 2013 at 8:52 AM

Craig, if you extend your notion that all email storage and servers are essentially databases, you will soon see that all content is essentially also in databases of some sort. File folders on a server, desktop or in the cloud are all databases. Any mechanism for holding content is at heart a database. That is, there is the root content inside the file (sometimes called text, but that may not always be true. It could be images or video, for example.) Next there is the metadata about the file itself (creation date, last edit and so on), then there is the database application-supplied information (who has access to the file, how the file is stored relative to other files), and then there may be yet another layer of permissions into the systems or servers at all (logins, remote access, etc.). This is important to note because when people create a special, separate litigation database, they are swapping the original (the native) database storage and metadata for another. It is for this reason that at Valora we keep litigation content in 2 ways: 1) the most “document-like” rendition (sometimes TIF or PDF, sometimes native, sometimes converted to something readable at all), and the more “database-style,” where the content is separated out from images, file metadata, system metadata, user-supplied notations (essentially user-created metadata) and attribute detection (tagging) which we supply based upon assessment of all those parts. When required we aggregate the appropriate parts into something viewable, searchable, minable, etc. The more prolific (volume) we get and the more device and format independent we get (complexity), the more we will need flexible storage and assessment capabilities that can disaggregate a file or a document into its root components: content, metadata (file, application & system) and derived attributes & forecasts.

LikeLike
- David Tobin said:
  
  July 3, 2013 at 9:16 AM
  
  Sandy – I get what you say in theory. . . .everything is a db, but an email is a different animal, a single email contains many fields, to, from, cc, date sent, date received, etc., etc. – whereas a word doc just has some meta data, but generally isn’t as useful – for example, TIFFS of word docs are much more useful than TIFF of emails,
  
  LikeLike
  - Sandy Serkes said:
    
    July 3, 2013 at 9:54 AM
    
    I would argue that the Word doc has just as many useful metadata information as an email, except that it takes some effort to generate or extrapolate them. Examples: languages present, responsiveness to various issues, topic trending, and similarity to other content elsewhere. Yes, email has these same attribute “fields,” but emails are often shorter and communication-ish (and texts & tweets even more so), whereas the Word doc often has much more purposeful content, intent and tone. The TIFF aspect is merely just a snapshot of momentary appearance in either case. Consider the generated website that assembles content and layout on the fly in response to user-request, construction rules and context. Now how will your TIFF help you? It’s all about the separated content, metadata, rules and deriveable attributes, particularly as they change over time.
    
    LikeLike
  - craigball said:
    
    July 3, 2013 at 4:06 PM
    
    I tend to take the opposite tact when comparing e-mail and Word documents as fodder for TIFF productions. TIFF images accompanied by load files holding extracted text and metadata can be made complete in terms of the informational payload of the messages. I don’t care for TIFF productions for the reasons outlined in the Postscript, but at least it can be used in ways that don’t conceal content.
    
    In contrast, TIFFs and load files fail rather spectacularly when Word documents contain collaborative content, comments and tracked changes. These items may be termed “application metadata” for conversational convenience, but they reflect *content* contributed by the user. They are where many evidentiary bodies are buried.
    
    I haven’t seen an imaged production that handles such content well. In fact, most producing parties still deal with such content by disingenuously pretending it wasn’t there. That reprehensible practice needs to be nipped in the bud.
    
    LikeLike
  - David Tobin said:
    
    July 3, 2013 at 4:20 PM
    
    Ok, but all I’m saying is you lose more by TIFFing an email over a word doc. Becase PST is a db, with review tool I can query all emails from john to bob that mention fred in the body from June 2012 thru August 2012. If the email was tiff’d and I didn’t get a PST that’s not happening. Yes, agree it’s always better to go native.
    
    LikeLike
  - craigball said:
    
    July 3, 2013 at 4:40 PM
    
    David: Agree, if all one receives is a naked TIFF. However, any lawyer smarter than a stump should demand production of essential metadata values and searchable text (derived directly from the electronic source) as fielded data in a load file. Naked TIFF productions are the 21st century equivalent of copying only one side of the documents then shuffling the copies before production.
    
    All I’m saying is that Word documents with collaborative content don’t even work when you get the load file.
    
    LikeLike
- Jim Monty said:
  
  July 27, 2014 at 7:53 PM
  
  Perhaps a better, more general term to use to describe email storage files such as PST files and EDB files is “data store.” A database is a kind of data store, but in today’s world, the word “database” strongly connotes a relational database management system (RDBMS) such as Microsoft SQL Server, Oracle Database, and IBM DB2. The Wikipedia article titled Data store (https://en.wikipedia.org/wiki/Data_store) includes email storage systems among its examples of different types of data stores, along with paper files, spreadsheets, file systems, databases, and several other examples.
  
  LikeLike
Josh Headley, D4 LLC said:

July 3, 2013 at 9:23 AM

I’ve always chuckled (silently and, hopefully, not visibly) during arguments about what “native format” really means. It boils down to whether you want something that’s technically correct but potentially useless, or whether you’re OK with bending the definition to “near-native” and getting something that’s helpful or at least usable towards the goal of trying the case.

Before an e-mail in Exchange land gets written into the EDB database, it snakes through edge servers, transport servers, and ultimately gets written to a transaction file on a mailbox server waiting to be committed to the database. Yet another layer of “native” before it even becomes “native” inside the EDB.

Heck, true native format is just a giant string of 1’s and 0’s, right? Let’s produce that way. It’s as native as it gets without getting all sub-atomic.

To your point, Craig, this could go on forever .. thanks for pointing out all the details and suggesting a path towards reasonableness.

LikeLike
- craigball said:
  
  July 3, 2013 at 3:33 PM
  
  Josh:
  
  Though I think you were just making a point, your quip about the “true native format” being ones and zeroes affords me the chance to make a point about the difference between the structure of a file and the manner in which that data in encoded on media. They’re easy to confuse and conflate, but they’re different.
  
  It’s the *structure and content* of the file being discussed in this post, not the mechanism by which the data is encoded. A file’s structure is independent of the storage schema that records its contents . You can write a numeric value in Roman numerals, in ones and zeroes (Base 2 or Binary), in 0-9 (Base 10 or Decimal), in 0-9/A-H (Base 16 or Hexadecimal) or a host of other ways; yet, it remains exactly the same numeric value.
  
  But when you change the structure of the file, it’s a different file. It may carry the identical complement of information, but it must be processed differently. Functionality and completeness of the contents of a file are not impacted by the manner in which it is stored on the media; but, they are often greatly impacted by the form/structure of the file.
  
  LikeLike
jimshook said:

July 3, 2013 at 10:56 AM

Great work, Craig. Thank you.

I’m interested in your experience on how often “special” metadata really turns out to be important, i.e. the unread flag, folder name, etc. There are many battles fought over those issues but, as with other metadata, the actual cases where the information is important seem infrequent.

LikeLike
- craigball said:
  
  July 3, 2013 at 3:52 PM
  
  Dear Jim:
  
  Thank you.
  
  I pretty much summed my experience up when I wrote, “Most of the time, you won’t miss it. Now and then, you’ll be lost without it.” Your examples were curious to me. I regard “read flags” as marginally important; but, I think of folder information as routinely important. I want to know if a message was found in Deleted Items or Drafts, don’t you?
  
  MAPI metadata becomes crucially important when the integrity of messages are questioned. Sad to say, but people do forge e-mails. Having “special” metadata like Message_ID data, MAPI date values, server transit routings and other header content can spell the difference between owning Facebook or losing it.
  
  The cost to re-collect such data and the risk that it may change or disappear suggests to me that I’d rather collect and produce much of it routinely than have my case hinge on it and be left sputtering about how sure I am that the metadata would have supported my position, if I’d only collected it..
  
  But I certainly agree that, for the most part, it’s the message bodies and attachments that are properly the focus of e-discovery. And, yes, the import of meta-information is infrequent. Major automobile collisions are also infrequent–I’ve never been in one in forty years of driving–but, I still always wear my seat belt.
  
  LikeLike
Rick Stieghorst said:

July 3, 2013 at 12:45 PM

Excellent article as usual Craig, thank you! I was REALLY hoping you would tackle the Lotus/NSF monster in as much detail as you did the others. While I believe it is experiencing a slow death in the marketplace, it is still quite heavily used, and just as heavily misunderstood.

LikeLike
- craigball said:
  
  July 3, 2013 at 2:15 PM
  
  Dear Rick:
  
  Your wish is my command, and I agree with you that there is much confusion about Domino and Notes in the e-discovery marketplace; but, that confusion is hard won. Many have expended a lot of effort trying *not* to understand Notes! I’ll start working on a Notes blog post ASAP.
  
  The funny thing about Notes is that, where Exchange is a mail application that happens to include a database, Notes is a database application that happens to do e-mail (more on that to come). The short answers to the same questions about native forms are pretty easy when it comes to Notes: The native file form for both Domino server data and the Notes local synchronization storage files are both the same: NSF. But, if there’s a major EDD service provider set up to produce in NSF, you could knock me over with a feather. Until the late introduction of EDD tools that process Notes mail natively (that is, via the Notes API), the longtime practice has been to rather blindly convert Notes messaging to PSTs or some more familiar form. In my experience, many who did this had little appreciation of what metadata they were leaving behind or what they were changing.
  
  Of course, when it comes to messages traversing the Internet via SMTP (Simple Mail Transfer Protocol), it doesn’t matter if it comes out of Notes, Exchange, Eudora, Gmail, Groupwise or a host of other applications, it’s going to transit in the same plain text (7-bit ASCII) MIME-compliant form I described in the post above.
  
  It would be easier to write a useful post on Notes, if I knew what readers really want to know? Any suggestions?
  
  LikeLike
Pingback: The Many Faces of Mike McBride » Blog Archive » This Week’s Links (weekly)
William Kellermann said:

August 2, 2013 at 12:20 PM

Great post. Some thoughts:

True native format is binary strings. Everything else is a rendering based on certain standards, some of which are true standards and some of which are proprietary to the developer of the hardware and software systems that laid down the binary. With the right tools and education, some can ‘read’ binary, some HEX, etc. I agree with Craig that utility is paramount. But the answer is tempered by the legal standard of ‘as maintained in the ordinary course of business or reasonably useable form.’ Following official standards, like NIST or ISO, as well as the proprietary developer path is usually the best first course. So, if we want or need ESI in a format supported by Windows file systems, and the source is Microsoft Exchange, via Outlook, the default format to save a message to the file system is MSG, because that is what Outlook and Windows will write by default.

On a separate note, when capturing Gmail via IMAP, especially if aliasing a user’s account, be careful that you don’t trigger Google’s security guillotine. Google will lock down a Gmail account for a minimum of 24 hours if it suspects malicious activity, which downloading all Gmail in a shore period of time to a previously unknown device will trigger. See the March 29,2013 help post here: https://support.google.com/mail/answer/43692?hl=en

LikeLike
Pingback: Good Questions! | Ball in your Court
Pingback: Preserving Gmail for Dummies | Ball in your Court
Pingback: QuickLINK » QuickLINK #15
Pingback: QuickLINK » QuickLINK #16
Pingback: Native or Not? Rethinking Public E-Mail Corpora for E-Discovery (Redux, 2013→2025) | Ball in your Court

Share this:

Related

22 thoughts on “What is Native Production for E-Mail?”