Five years ago, I wrote The Luddite Litigator’s Guide to Databases in E-Discovery to accompany a lecture on the subject at the 2010 Georgetown Advanced E-Discovery Institute. When I went looking for source material for the article, I was struck by how little there was. Databases hold most of what we seek in discovery; yet, no one had written anything practical about discovering structured data. My Luddite Litigator’s Guide was a start, but far from a comprehensive treatment as it lacked the takeaway lawyers crave most: exemplar language and forms.
The curse of legal writing is that we are less prone to create than emulate. We borrow language from forms as though it were enchanted incantations. In fact, there are precious few magic words that must appear in pleadings and discovery requests, a point made often and expertly by Bryan Garner, whose thoughtful work I commend to you as a path to better legal writing.
I loathe the practice of law from forms, but I bow to its power. If we hope to get lawyers to use more efficient and precise prose in their discovery requests, we can’t just harangue them to do it; we’ve “got to put the hay down where the goats can get it.” To that end, here is some language to consider when seeking information about databases and when serving notice of the deposition of corporate designees (e.g., per Rule 30(b)(6) in Federal civil practice):
For each database or system that holds potentially responsive information, we seek the following information to prepare to question the designated person(s) who, with reasonable particularity, can testify on your behalf about information known to or reasonably available to you concerning:
- The standard reporting capabilities of the database or system, including the nature, purpose, structure, appearance, format and electronic searchability of the information conveyed within each standard report (or template) that can be generated by the database or system or by any overlay reporting application;
- The enhanced reporting capabilities of the database or system, including the nature, purpose structure, appearance, format and electronic searchability of the information conveyed within each enhanced or custom report (or template) that can be generated by the database or system or by any overlay reporting application;
- The flat file and structured export capabilities of each database or system, particularly the ability to export to fielded/delimited or structured formats in a manner that faithfully reflects the content, integrity and functionality of the source data;
- Other export and reporting capabilities of each database or system (including any overlay reporting application) and how they may or may not be employed to faithfully reflect the content, integrity and functionality of the source data for use in this litigation;
- The structure of the database or system to the extent necessary to identify data within potentially responsive fields, records and entities, including field and table names, definitions, constraints and relationships, as well as field codes and field code/value translation or lookup tables.
- The query language, syntax, capabilities and constraints of the database or system (including any overlay reporting application) as they may bear on the ability to identify, extract and export potentially responsive data from each database or system;
- The user experience and interface, including datasets, functionality and options available for use by persons involved with the PROVIDE APPROPRIATE LANGUAGE RE THE ACTIVITIES PERTINENT TO THE MATTERS MADE THE BASIS OF THE SUIT;
- The operational history of the database or system to the extent that it may bear on the content, integrity, accuracy, currency or completeness of potentially responsive data;
- The nature, location and content of any training, user or administrator manuals or guides that address the manner in which the database or system has been administered, queried or its contents reviewed by persons involved with the PROVIDE APPROPRIATE LANGUAGE RE THE ACTIVITIES PERTINENT TO THE MATTERS MADE THE BASIS OF THE SUIT;
- The nature, location and contents of any schema, schema documentation (such as an entity relationship diagram or data dictionary) or the like for any database or system that may reasonably be expected to contain information relating to the PROVIDE APPROPRIATE LANGUAGE RE THE ACTIVITIES PERTINENT TO THE MATTERS MADE THE BASIS OF THE SUIT;
- The capacity and use of any database or system to log reports or exports generated by, or queries run against, the database or system where such reports, exports or queries may bear on the PROVIDE APPROPRIATE LANGUAGE RE THE ACTIVITIES PERTINENT TO THE MATTERS MADE THE BASIS OF THE SUIT;
- The identity and roles of current or former employees or contractors serving as database or system administrators for databases or systems that may reasonably be expected to contain (or have contained) information relating to the PROVIDE APPROPRIATE LANGUAGE RE THE ACTIVITIES PERTINENT TO THE MATTERS MADE THE BASIS OF THE SUIT; and
- The cost, burden, complexity, facility and ease with which the information within databases and systems holding potentially responsive data relating to the PROVIDE APPROPRIATE LANGUAGE RE THE ACTIVITIES PERTINENT TO THE MATTERS MADE THE BASIS OF THE SUIT; may be identified, preserved, searched, extracted and produced in a manner that faithfully reflects the content, integrity and functionality of the source data.
Yes, this is the dread “discovery about discovery;” but, it’s a necessary precursor to devising query and production strategies for databases. If you don’t know what the database holds or the ways in which relevant and responsive data can be extracted, you are at the mercy of opponents who will give you data in unusable forms or give you nothing at all.
Remember, these are not magic words. I just made them up, and there’s plenty of room for improvement. If you borrow this language, please take time to understand it, and particularly strive to know why you are asking for what you demand. Supplying the information requires effort that should be expended in support of a genuine and articulable need for the information. If you don’t need the information or know what you plan to do with it, don’t ask for it.
These few questions were geared to the feasibility of extracting data from databases so that it stays utile and complete. Enterprise databases support a raft of standardized reporting capabilities: “screens” or “reports” run to support routine business processes and decisionmaking. An insurance carrier may call a particular report the “Claims File;” but, it is not a discrete “file” at all. It’s a predefined template or report that presents a collection of data extracted from the database in a consistent way. Lots of what we think of as sites or documents are really reports from databases. Your Facebook page? It’s a report. Your e-mail from Microsoft Outlook? Also a report.
In addition to supplying a range of standard reports, enterprise databases can be queried using enhanced reporting capabilities (“custom reports”) and using overlay reporting tools–commercial software “sold separately” and able to interrogate the database in order to produce specialized reporting or support data analytics. A simple example is presentation software that generates handsome charts and graphics based on data in the database. The presentation software didn’t come with the database. It’s something they bought(or built) to “bolt on” for enhanced/overlay reporting.
Databases are constructed to enforce specified field property requirements or “constraints.” These may include:
- Field size: limiting the number of characters that can populate the field or permitting a variable length entry for memos;
- Data type: text, currency, integer numbers, date/time, e-mail address and masks for phone numbers, Social security numbers, Zip codes, etc.;
- Unique fields: Primary keys must be unique. You typically wouldn’t want to assign the same case number to different matters or two Social Security numbers to the same person;
- Group or member lists: Often fields may only be populated with data from a limited group of options (e.g., U.S. states, salutations, departments and account numbers);
- Validation rules: To promote data integrity, you may want to limit the range of values ascribed to a field to only those that makes sense. A field for a person’s age shouldn’t accept negative values or (so far) values in excess of 125. A time field should not accept “25:00pm” and a date field designed for use by Americans should guard against European date notation. Credit card numbers must conform to specific rules, as must Zip codes and phone numbers; and
- Required data: The absence of certain information may destroy the utility of the record, so certain fields are made mandatory (e.g., a car rental database may require input of a valid driver’s license number).
Databases are queried using a “query language.” Users needn’t dirty their hands with query languages because queries are often executed “under the hood” by the use of those aforementioned standardized screens, reports and templates. Think of these as pre-programmed, pushbutton queries. There is usually more (and often much more) that can be gleaned from a database than what the standardized reports supply, and some of this goes to the integrity of the data itself. In that case, understanding the query language is key to fashioning a query that extracts what you need to know, both within the data and about the data.
As importantly as learning what the database can produce is understanding what the database does or does not display to end users. These are the user experience (UX) and user interface (UI). Screen shots may be worth a thousand words when it comes to understanding what the user saw or what the user might have done to pursue further intelligence.
Enterprise and commercial databases tend to be big and expensive. Accordingly, most are well documented in manuals designed for administrators and end users. When a producing party objects that running a query is burdensome, the manuals may make clear that what you seek is no big deal to obtain.
In simplest terms. a database’s schema is how it works. It may be the system’s logical schema, detailing how the database is designed in terms of its table structures, attributes, fields, relationships, joins and views. Or, it could be its physical schema, setting out the hardware and software implementation of the database on machines, storage devices and networks. The schema of a database is rarely a trade secret or proprietary data; although, you may hear that objection raised to frustrate discovery. The schema is more like a database map, typically supplied as a table or diagram.
One feature that sets databases apart from many others forms of ESI is the critical importance of the fielding of data. Preserving the fielded character of data is essential to preserving its utility and searchability. I wrote about this recently in “The Virtues of Fielding.” “Fielding data” means that information is stored in locations dedicated to holding just that information. Fielding data serves to separate and identify information so you can search, sort and cull using just that information. It’s a capability we take for granted in databases but that is often crippled or eradicated when data is produced in e-discovery. Be sure that you consider the form of production, and insure that the fielded character of the data produced will not be lost, whether supplied as a standard report or as a delimited export.
Seeking discovery from databases is a key capability in modern litigation, and it’s not easy for the technically challenged (although it’s probably a whole lot easier than your opponent claims). Getting the proper data in usable forms demands careful thought, tenacity and more-than-a-little homework. Still, anyone can do it, alone with a modicum of effort, or aided by a little expert assistance.
Happily, since I published my Luddite Litigators Guide to Databases, others have waded in and produced more practical scholarship. Here are links to two recent, thoughtful publications on the topic:
Requests for Production of Databases: Documents v. Data, by Christine Webber and Jeff Kerr