Poring over Requests for Production this morning, I was gratified to see the client sought native forms of electronically-stored information; but the request said only, “All documents shall be Bates stamped and provided in native format.” Is that sufficient? To me, specifying forms of production is best done via an agreed ESI production protocol, but failing that, requesting parties should supply more detail than simply asking for “native format.” I believe requests need to lay out the forms sought for particularized types of ESI and specify the essential ancillary metadata to be produced in load files.
Requesting native forms in discovery demands a few adaptations versus the way hard copy documents were sought in years past. Take that request, “All documents shall be Bates stamped and provided in native format.” If a document is supplied natively and not printed out or “flattened” to a static TIFF, where do you “stamp” the Bates number? The solution is simple (in the file name and load file), but not obvious to lawyers unschooled in e-discovery.
Specifying more than “native format” in the request is sensible because much ESI doesn’t lend itself to production in its “true” native forms. The “true” native form of email is typically a database of multiple user accounts holding messages, calendars, contacts, to-do lists, etc. An opponent need not (and won’t) produce such a massive, undifferentiated blob of data. So the better practice is to specify preferred near-native forms be produced; that is, forms that preserve the integrity and utility of the evidence and support the granularity needed for discovery of only relevant, non-privileged material. As well, providing a load file specification ensures you obtain metadata values that only the producing party can supply (like Bates numbers, originating hash values, source paths and custodians). Too, you want that metadata in a structure suited to your needs and tools.
Native productions are more utile and cost-effective, but only to requesting parties prepared to reap their superior utility and savings. One reason why producing parties have gotten away with producing inefficient and unsearchable static image formats (TIFFs) for so long is because TIFF images can be viewed in a browser; hence, recipients of TIFF productions can read documents page-by-page without review software. Yet, that easy access comes at a perilous cost. TIFF productions are many times larger in byte volume than native production of the same material, making it significantly more costly for requesting parties to ingest and host the evidence. Moreover, TIFF images tend not to work well for common formats like spreadsheets and PowerPoint presentations, and don’t work at all for, e.g., video and sound files. Finally, evidence produced as TIFF images gets shorn of metadata and searchable electronic content, requiring that the stripped metadata and searchable content be produced separately and reconstructed using software to comprise, at best, a degraded “TIFF Plus” facsimile of the evidence.
For these reasons and more, requests for production must either succeed the entry of an agreed- or court-ordered production protocol or requesting parties must include useful and practical instructions about the forms of production right in the body of the Request.
To simplify my client’s task, I drafted an Appendix to be grafted onto the Requests for Production and suggested my client take out “All documents shall be Bates stamped and provided in native format” and substitute the phrase: “All production should be produced in accordance with the instructions contained in Appendix A to this Request.” It’s not perfect, but it should get the job done.
The Appendix I supplied reads as follows, and I don’t offer it as a paragon of legal draftsmanship. Each time I create something like this, it’s a struggle deciding what details to omit versus supplying all features of a full-fledged production protocol. I’ve kept it to about 1,000 words, and a tad verbose at that. It’s for you to decide if it adds substantial value over simply asking for “native format.” Tell me what do you think in the comments. If you’d like a Microsoft Word version of Appendix A to play with, you can download it from this link: http://craigball.com/Request_for_Native_Production-Appendix_A.docx
Appendix A: Forms of Production
I. Definitions
“Electronically Stored Information” or “ESI” includes communications, presentations, writings, drawings, graphs, charts, photographs, posts, video and sound recordings, images, and other data or data compilations existing in electronic form on any medium including, but not limited to: (i) e-mail, texting, social media or other means of electronic communications; (ii) word processing files (e.g., Microsoft Word); (iii) computer presentations (e.g., Microsoft PowerPoint); (iv) spreadsheets (e.g., Microsoft Excel); (v) database content and (vi) media files (e.g., jpg, wav).
“Metadata” means and refers to (i) structured (fielded) information embedded in a native file which describes the characteristics, origins, usage, and/or validity of the electronic file; (ii) information generated automatically by operation of a computer or other information technology system when a native file is created, modified, transmitted, deleted, or otherwise manipulated by a user of such system; (iii) information, such as Bates numbers, created during the course of processing documents or ESI for production; and (iv) information collected during the course of collecting documents or ESI, such as the name of the media device, or the custodian or non-custodial data source from which it was collected.
“Native Format” means and refers to the format of ESI in which it was generated and/or as used by the producing party in the usual course of its business and in its regularly conducted activities. For example, the native format of an Excel workbook is a .xls or .xslx file and the native format of a Microsoft Word document is a .doc or .docx file.
“Near-Native Format’ means and refers to a form of ESI production that preserves the functionality, searchability and integrity of a Native Format item when it is infeasible or unduly burdensome to produce the item in Native Format. For example, an MBOX is a suitable near-native format for production of Gmail, an Excel spreadsheet is a suitable near-native format for production of Google Sheets, and EML and MSG files are suitable near-native formats for production of e-mail messages. Static images are not near-native formats for production of any form except Hard Copy Documents.
II. Production
1. Responsive electronically stored information (ESI) shall be produced in its Native Format with Metadata.
2. If it is infeasible to produce an item of responsive ESI in its Native Format, it may be produced in a Near-Native Format with options for same set out in the table below:
Source ESI | Native or Near-Native Form or Forms Sought |
Microsoft Word documents | .DOC, .DOCX |
Microsoft Excel Spreadsheets | .XLS, .XLSX |
Microsoft PowerPoint Presentations | .PPT, .PPTX |
Microsoft Access Databases | .MDB, .ACCDB |
WordPerfect documents | .WPD |
Adobe Acrobat Documents | |
Photographs | .JPG, .PDF |
Messages should be produced in a form or forms that readily support import into standard e-mail client programs; that is, the form of production should adhere to the conventions set out in RFC 5322 (the internet e-mail standard). For Microsoft Exchange or Outlook messaging, .PST format will suffice. Single message production formats like .MSG or .EML may be furnished, if source foldering data is preserved and produced. If your workflow requires that attachments be extracted and produced separately from transmitting messages, attachments should be produced in their native forms with parent/child relationships to the message and container(s) preserved and produced in a delimited text file. | |
Social Media | Social media content should be collected using industry standard practices incorporating reasonable methods of authentication, including but not limited to MD5 hash values. Social media and webpages should be produced as HTML faithful to the content and appearance of the native source, or as JPG images with a searchable, document-level files containing textual content and delimited metadata (including “likes” and comments) |
3. Paper (Hard-Copy) documents or items requiring redaction shall be produced in static image formats scanned at 300 dpi e.g., single-page Group IV.TIFF or multipage PDF images. If an item uses color to convey information and not merely for aesthetic reasons, the producing party shall not produce the item in a form that does not display color. The full content of each document will be extracted directly from the native source where feasible or, where infeasible, by optical character recognition (OCR) or other suitable method to a searchable text file produced with the corresponding page image(s) or embedded within the image file. Redactions shall be logged along with other information items withheld on claims of privilege.
4. Each item produced shall be identified by naming the item to correspond to a Bates number according to the following protocol:
i. The first three (3) characters of the filename will reflect a unique alphanumeric designation identifying the party making production.
ii. The next eight (8) characters will be a unique, consecutive numeric value assigned to the item by the producing party. This value shall be padded with leading zeroes as needed to preserve its length.
iii. The final six (6) characters are reserved to a sequence consistently beginning with a dash (-) or underscore (_) followed by a five-digit number reflecting pagination of the item when printed to paper or converted to an image format for use in proceedings or when attached as exhibits to pleadings.
iv. This format of the Bates identifier must remain consistent across all productions. The number of digits in the numeric portion and characters in the alphanumeric portion of the identifier should not change in subsequent productions, nor should spaces, hyphens, or other separators be added or deleted except as set out above.
5. If a response to discovery requires production of discoverable electronic information contained in a database, you may produce standard reports; that is, reports that can be generated in the ordinary course of business and without specialized programming. All such reports shall be produced in a delimited electronic format preserving field and record structures and names. If the request cannot be fully answered by production of standard reports, Producing Party should advise the Requesting Party of same so the parties may meet and confer regarding further programmatic database productions.
III. Load Files
Producing party shall furnish a delimited load file in industry-standard Opticon and Concordance formats supplying the metadata field values listed below for each item produced (to the extent the values exist and as applicable):
FIELD | DEFINITION |
CUSTODIAN | Name of person or source from which data was collected. **Where redundant names occur, individuals should be distinguished by an initial which is kept constant throughout productions (e.g., Smith, John A. and Smith, John B.) |
ALL_CUSTODIANS | If deduplication employed, name(s) of any person(s) from whom the identical item was collected and deduplicated. |
BEGBATES | Beginning Bates Number (production number) |
ENDBATES | End Bates Number (production number) |
BEGATTACH | First Bates number of first attachment in family range |
ENDATTACH | Last Bates number of last attachment in family range (i.e. Bates number of the last page of the last attachment). |
ATTACHCOUNT | Number of attachments to an e-mail. |
ATTACHNAMES | Name of each individual attachment, separated by semi-colons. |
PARENTBATES | BEGBATES number for the parent email of a family (will not be populated for documents that are not part of a family) |
ATTACHBATES | Bates number from the first page of each attachment |
PGCOUNT | Number of pages in the document |
FILENAME | Original filename at the point of collection, without extension of native file |
FILEEXTENSION | File extension of native file |
FILESIZE | File Size |
FILEPATH | File source path for all electronically collected documents and emails, which includes location, folder name, file name, and file source extension. |
NATIVEFILELINK | For documents provided in native format only |
TEXTPATH | File path for OCR or Extracted Text files |
FROM | Sender |
TO | Recipient |
CC | Additional Recipients |
BCC | Blind Additional Recipients |
SUBJECT | Subject line of e-mail. |
DATESENT (mm/dd/yyyy hh:mm:ss AM) | Date Sent |
EMAILDATSORT (mm/dd/yyyy hh:mm:ss AM) | Sent Date of the parent email (physically top email in a chain, i.e. immediate/direct parent email) |
MSGID | Email system identifier assigned by the host email system. |
IRTID | E-mail In-Reply-To ID assigned by the host e-mail system. |
CONVERSATIONID | E-mail thread identifier. |
HASHVALUE | MD5 Hash Value of production item |
TITLE | Title provided by user within the document |
AUTHOR | Creator of a document |
DATECRTD (mm/dd/yyyy hh:mm:ss AM) | Creation date |
LASTMODD (mm/dd/yyyy hh:mm:ss AM) | Last Modified Date |
The chart above describes the metadata fields to be produced in generic, commonly used terms. You should adapt these to the specific types of electronic files you are producing to the extent such metadata fields are exist in the original ESI and can be extracted as part of the electronic data discovery process. Any ambiguity about a metadata field should be discussed with the Requesting Party prior to processing and production.
ruthweiss7504 said:
Craig, Great post on the production protocol. Can you provide some guidance on a production protocol and ESI protocol for mobile data?
LikeLike
craigball said:
My prior articles on that here and at my website craigball.com proved unavailing? There isn’t a single form of production suited to all types of mobile data anymore than there is one for all data on a laptop or desktop. For messaging, Relativity offers an open-source single message format; but the bigger question is what’s your review tool? It’s a situation where the ability to review the data will dictate the forms you seek as much as anything else. “A delimited format” is about all I can suggest be sought without greater specificity as to the data under scrutiny.
LikeLike
ruthweiss7504 said:
Typically it’s RelOne, but we are also using Everlaw and CS Disco.
LikeLike
craigball said:
And the data sought?
LikeLike
ruthweiss7504 said:
Text messages (SMS and iM) and photos. iPhones and Androids.
LikeLike
craigball said:
My first instinct is to ask what the requesting party asked for in terms of forms, but I know that’s unhelpful. For texts, I can live with any delimited format that preserves the metadata and the rich content. That can range from JSON to a CSV or TSV. If I have the delimited data, I can present them as text bubbles in a conversation, when needed.
Photos in JPG with EXIF intact will suffice in almost all cases.
LikeLike
Pingback: Week 33 – 2022 – This Week In 4n6