Databases in Discovery

01 Sunday Nov 2015

Posted by craigball in Computer Forensics, E-Discovery

Five years ago, I wrote The Luddite Litigator’s Guide to Databases in E-Discovery to accompany a lecture on the subject at the 2010 Georgetown Advanced E-Discovery Institute. When I went looking for source material for the article, I was struck by how little there was. Databases hold most of what we seek in discovery; yet, no one had written anything practical about discovering structured data. My Luddite Litigator’s Guide was a start, but far from a comprehensive treatment as it lacked the takeaway lawyers crave most: exemplar language and forms.

The curse of legal writing is that we are less prone to create than emulate. We borrow language from forms as though it were enchanted incantations. In fact, there are precious few magic words that must appear in pleadings and discovery requests, a point made often and expertly by Bryan Garner, whose thoughtful work I commend to you as a path to better legal writing.

I loathe the practice of law from forms, but I bow to its power. If we hope to get lawyers to use more efficient and precise prose in their discovery requests, we can’t just harangue them to do it; we’ve “got to put the hay down where the goats can get it.” To that end, here is some language to consider when seeking information about databases and when serving notice of the deposition of corporate designees (e.g., per Rule 30(b)(6) in Federal civil practice):

For each database or system that holds potentially responsive information, we seek the following information to prepare to question the designated person(s) who, with reasonable particularity, can testify on your behalf about information known to or reasonably available to you concerning:

The standard reporting capabilities of the database or system, including the nature, purpose, structure, appearance, format and electronic searchability of the information conveyed within each standard report (or template) that can be generated by the database or system or by any overlay reporting application;

The enhanced reporting capabilities of the database or system, including the nature, purpose structure, appearance, format and electronic searchability of the information conveyed within each enhanced or custom report (or template) that can be generated by the database or system or by any overlay reporting application;

The flat file and structured export capabilities of each database or system, particularly the ability to export to fielded/delimited or structured formats in a manner that faithfully reflects the content, integrity and functionality of the source data;

Other export and reporting capabilities of each database or system (including any overlay reporting application) and how they may or may not be employed to faithfully reflect the content, integrity and functionality of the source data for use in this litigation;

The structure of the database or system to the extent necessary to identify data within potentially responsive fields, records and entities, including field and table names, definitions, constraints and relationships, as well as field codes and field code/value translation or lookup tables.

The query language, syntax, capabilities and constraints of the database or system (including any overlay reporting application) as they may bear on the ability to identify, extract and export potentially responsive data from each database or system;

The user experience and interface, including datasets, functionality and options available for use by persons involved with the PROVIDE APPROPRIATE LANGUAGE RE THE ACTIVITIES PERTINENT TO THE MATTERS MADE THE BASIS OF THE SUIT;

The operational history of the database or system to the extent that it may bear on the content, integrity, accuracy, currency or completeness of potentially responsive data;

The nature, location and content of any training, user or administrator manuals or guides that address the manner in which the database or system has been administered, queried or its contents reviewed by persons involved with the PROVIDE APPROPRIATE LANGUAGE RE THE ACTIVITIES PERTINENT TO THE MATTERS MADE THE BASIS OF THE SUIT;

The nature, location and contents of any schema, schema documentation (such as an entity relationship diagram or data dictionary) or the like for any database or system that may reasonably be expected to contain information relating to the PROVIDE APPROPRIATE LANGUAGE RE THE ACTIVITIES PERTINENT TO THE MATTERS MADE THE BASIS OF THE SUIT;

The capacity and use of any database or system to log reports or exports generated by, or queries run against, the database or system where such reports, exports or queries may bear on the PROVIDE APPROPRIATE LANGUAGE RE THE ACTIVITIES PERTINENT TO THE MATTERS MADE THE BASIS OF THE SUIT;

The identity and roles of current or former employees or contractors serving as database or system administrators for databases or systems that may reasonably be expected to contain (or have contained) information relating to the PROVIDE APPROPRIATE LANGUAGE RE THE ACTIVITIES PERTINENT TO THE MATTERS MADE THE BASIS OF THE SUIT; and
The cost, burden, complexity, facility and ease with which the information within databases and systems holding potentially responsive data relating to the PROVIDE APPROPRIATE LANGUAGE RE THE ACTIVITIES PERTINENT TO THE MATTERS MADE THE BASIS OF THE SUIT; may be identified, preserved, searched, extracted and produced in a manner that faithfully reflects the content, integrity and functionality of the source data.

Yes, this is the dread “discovery about discovery;” but, it’s a necessary precursor to devising query and production strategies for databases. If you don’t know what the database holds or the ways in which relevant and responsive data can be extracted, you are at the mercy of opponents who will give you data in unusable forms or give you nothing at all.

Remember, these are not magic words. I just made them up, and there’s plenty of room for improvement. If you borrow this language, please take time to understand it, and particularly strive to know why you are asking for what you demand. Supplying the information requires effort that should be expended in support of a genuine and articulable need for the information. If you don’t need the information or know what you plan to do with it, don’t ask for it.

These few questions were geared to the feasibility of extracting data from databases so that it stays utile and complete. Enterprise databases support a raft of standardized reporting capabilities: “screens” or “reports” run to support routine business processes and decisionmaking. An insurance carrier may call a particular report the “Claims File;” but, it is not a discrete “file” at all. It’s a predefined template or report that presents a collection of data extracted from the database in a consistent way. Lots of what we think of as sites or documents are really reports from databases. Your Facebook page? It’s a report. Your e-mail from Microsoft Outlook? Also a report.

In addition to supplying a range of standard reports, enterprise databases can be queried using enhanced reporting capabilities (“custom reports”) and using overlay reporting tools–commercial software “sold separately” and able to interrogate the database in order to produce specialized reporting or support data analytics. A simple example is presentation software that generates handsome charts and graphics based on data in the database. The presentation software didn’t come with the database. It’s something they bought(or built) to “bolt on” for enhanced/overlay reporting.

Databases are constructed to enforce specified field property requirements or “constraints.” These may include:

Field size: limiting the number of characters that can populate the field or permitting a variable length entry for memos;
Data type: text, currency, integer numbers, date/time, e-mail address and masks for phone numbers, Social security numbers, Zip codes, etc.;
Unique fields: Primary keys must be unique. You typically wouldn’t want to assign the same case number to different matters or two Social Security numbers to the same person;
Group or member lists: Often fields may only be populated with data from a limited group of options (e.g., U.S. states, salutations, departments and account numbers);
Validation rules: To promote data integrity, you may want to limit the range of values ascribed to a field to only those that makes sense. A field for a person’s age shouldn’t accept negative values or (so far) values in excess of 125. A time field should not accept “25:00pm” and a date field designed for use by Americans should guard against European date notation. Credit card numbers must conform to specific rules, as must Zip codes and phone numbers; and
Required data: The absence of certain information may destroy the utility of the record, so certain fields are made mandatory (e.g., a car rental database may require input of a valid driver’s license number).

Databases are queried using a “query language.” Users needn’t dirty their hands with query languages because queries are often executed “under the hood” by the use of those aforementioned standardized screens, reports and templates. Think of these as pre-programmed, pushbutton queries. There is usually more (and often much more) that can be gleaned from a database than what the standardized reports supply, and some of this goes to the integrity of the data itself. In that case, understanding the query language is key to fashioning a query that extracts what you need to know, both within the data and about the data.

As importantly as learning what the database can produce is understanding what the database does or does not display to end users. These are the user experience (UX) and user interface (UI). Screen shots may be worth a thousand words when it comes to understanding what the user saw or what the user might have done to pursue further intelligence.

Enterprise and commercial databases tend to be big and expensive. Accordingly, most are well documented in manuals designed for administrators and end users. When a producing party objects that running a query is burdensome, the manuals may make clear that what you seek is no big deal to obtain.

In simplest terms. a database’s schema is how it works. It may be the system’s logical schema, detailing how the database is designed in terms of its table structures, attributes, fields, relationships, joins and views. Or, it could be its physical schema, setting out the hardware and software implementation of the database on machines, storage devices and networks. The schema of a database is rarely a trade secret or proprietary data; although, you may hear that objection raised to frustrate discovery. The schema is more like a database map, typically supplied as a table or diagram.

One feature that sets databases apart from many others forms of ESI is the critical importance of the fielding of data. Preserving the fielded character of data is essential to preserving its utility and searchability. I wrote about this recently in “The Virtues of Fielding.” “Fielding data” means that information is stored in locations dedicated to holding just that information. Fielding data serves to separate and identify information so you can search, sort and cull using just that information. It’s a capability we take for granted in databases but that is often crippled or eradicated when data is produced in e-discovery. Be sure that you consider the form of production, and insure that the fielded character of the data produced will not be lost, whether supplied as a standard report or as a delimited export.

Seeking discovery from databases is a key capability in modern litigation, and it’s not easy for the technically challenged (although it’s probably a whole lot easier than your opponent claims). Getting the proper data in usable forms demands careful thought, tenacity and more-than-a-little homework. Still, anyone can do it, alone with a modicum of effort, or aided by a little expert assistance.

Happily, since I published my Luddite Litigators Guide to Databases, others have waded in and produced more practical scholarship. Here are links to two recent, thoughtful publications on the topic:

Requests for Production of Databases: Documents v. Data, by Christine Webber and Jeff Kerr

The Sedona Conference Database Principles Addressing the Preservation & Production of Databases & Database Information in Civil Litigation

11 thoughts on “Databases in Discovery”

Pingback: Databases in Discovery | @ComplexD
jimshook said:

November 2, 2015 at 8:36 AM

Craig, great stuff as always. What is your experience on the practical ability to obtain information about the underlying databases and suggest new queries or reports? The thinking is old fashioned, but we have all heard arguments that a new query or report actually creates a new document and thus should not be permitted. The counter is to request the database itself (or at least a relevant extract of tables / fields) and run queries on your own. Either way, the complexity of databases can make it difficult for judges to understand. How has this played out for you?

LikeLike
- craigball said:
  
  November 2, 2015 at 9:56 AM
  
  Thanks, Jim.
  
  I’ve had a mixed experience re: getting this additional information re: databases when the database isn’t right on point but may contain relevant information. More times than not, a compromise is arrived at that is sufficient to meet the needs of the requesting party. But, in those cases where a database is central to the issues before the court (e.g., adverse reaction reports in pharma cases), I have had almost universal success in obtaining further information about the database, schema, manuals, etc. This has included running custom queries/exports because, once you get to the right person, they understand that running a query against data is rarely a big deal. It’s a database, after all. That’s what it’s for. The only people who fret are the lawyers, and they’re worried about creating a precedent where they would have to depart from their usual path of providing as little relevant information as possible. That may sound harsh, but it’s so true.
  
  Not surprisingly, as a Special Master, I’ve met less resistance to discovery from databases; but, that’s because I can compel a party to bring a truly knowledgeable person to the fore with little fuss and bother. The hurdle is to get beyond the lawyers, who have repeatedly told the court that the data is wholly inaccessible or can only be accessed at enormous cost and burden. Though sometimes they are simply lying to the court, more often they have heard the words they sought to hear when checking with IT. IT is prone to say whatever they think will make the lawyers go away ASAP (they’re smart that way), and wrap what they say in enough technobabble to make the lawyers anxious to flee with their prize: a representation that it (mostly) can’t be done. Perhaps my experience is atypical; but when it comes to databases, the lawyer saying “it can’t be done” is frequently going to end up exposed as incompetent or dishonest. Far too often, that’s really what the fight is about: the lawyer saving face.
  
  As to the issue you raise about “creating,” rather than simply producing, documents, I touched on that in my post of Aug 17, 2013:
  
  “Databases pose an issue little discussed; viz., Is a “document” discoverable if it doesn’t exist but must be created? When requesting parties seek “documents,” there is a risk that the producing party may elect to interpret the request as encompassing only items in esse to the exclusion of items in posse. Databases (e.g., Documentum, Oracle) often hold documents in esse, but most hold data that does not constitute a document in esse until and unless a query is executed to create a report. Do not assume that a responding party will feel obliged to generate documents from databases absent an explicit demand for same.”
  
  I think I was insufficiently dismissive of the argument. A database is a file cabinet. Contending that the production required creation of a document (i.e., a subset of data in esse) is tantamount to contending that no one need produce items from a file cabinet because the production of something less than the entire contents of a cabinet is a new document and, by the way, the composition of the production is protected as attorney work product because a lawyer’s thinking is revealed by the determinations made to assess relevance and privilege! I hope no judge would fall for that hogwash, but I wouldn’t put it past cretinous counsel to make the argument.
  
  The information is discoverable (if relevant and not privileged). That the information is assembled into what some call a document for convenience of review shouldn’t serve to change the discoverable nature of the data. That’s my story…and I’m sticking to it.
  
  LikeLike
Jeff Kerr said:

November 2, 2015 at 9:04 AM

Thanks for the great post, Craig.

One thing that I’ve always found useful in conducting database discovery is requesting “audit trail” fields for each type of data sought. For example, whenever there is any question about the validity of the data (e.g., whether it was altered after the fact, as in a supervisor changing an employee’s clock-in or clock-out time), the audit trail data can be very powerful evidence.

I have a functional definition of what “audit trail” data means. Most often, for each item/row/record in the database, the audit trail consists of fields showing when the record was created, which user created it, when it was last modified (if at all), and who modified it (if anyone).

Some databases also have advanced features like versioning of records, so that an older version of particular record can be restored to the form it was in before it was modified. While I’m not sure if this qualifies as an audit trail, it can provide similar information.

Finally, from personal experience, I’ve found that modern databases are increasingly easy to back up and to restore from backups. I’d venture to guess that most enterprise DBs are backed up extensively. If any information is missing from the live DB, chances are that the restoring a backup might not be as difficult as opposing counsel claims. As always, knowing the precise DB being used (Oracle, PostgreSQL, MySQL, Salesforce) and the software that runs any client applications provides a great way to conduct an independent evaluation of counsel’s claims.

LikeLike
- craigball said:
  
  November 2, 2015 at 10:01 AM
  
  Thanks, Jeff. Useful tips. The transaction audit trail you reference may be called “transaction journaling” in databaseland. Updates are logged and these changes recorded until committed. The log or “journal” permits rollback of transactions in the event that something like a power failure interrupted the transaction before it was complete.
  
  LikeLike
ESIDence said:

November 2, 2015 at 12:22 PM

Sage advice, as always.

It may be implicit, but I’ll suggest that the information discussed in this post takes on higher value when linked to custodian interviews and depositions. Custodians are often creative types who “repurpose” database fields in ways that are transparent to the database administrator and the related schemas. For example, the use of “embedded foreign keys” is a technique whereby custodians (“users”) use a comment field or other editable field to create a “pointer” to a record in a functionally- related file or database, outside of either database’s IT-defined structure (physical or logical). Often, the “target” information is experiential (e.g., a customer-service rep’s personal notes and impressions about a particular customer) or derivative (such as the shared-server location of a spreadsheet containing both extracted data and “crunched” (derived) information which is not contained in the database itself), provided to colleagues outside of the “official” purposes of the database. The health care industry is rife with this technique, but industries and functions where the custodian population is largely comprised of “knowledge workers” (like engineers, or financial analysts) provide fertile ground for similar techniques.

The point is, custodians often are the ONLY source of information about how ESI is used (and its relevance in a specific matter). Probing the issue (whether in “friendly” interviews of a party’s own custodians, or in depositions of opposing parties’ “users”) is particularly valuable where a database has already been determined to contain responsive information.

Every enterprise has custodians, but don’t overlook the cagey “CUSS-todians” whose responsive information might be hidden in plain sight.

LikeLike
- craigball said:
  
  November 2, 2015 at 12:39 PM
  
  Good point about pointers; but, I wish it came down to such esoterica. I’m grappling with lawyers who try to pretend databases don’t exist, even when the whole company runs on databases.
  
  I participated in a Meet & Confer this morning where the producing party’s counsel claimed that he did not regard the RFPs as implicating a need to discuss databases because the requests spoke of “electronic software used for” and “computer digital systems and/or networks relating in any way to” typical database functions. Granted, I would have liked for the clients to have used different language; but, really, anyone but a lawyer trying to be obtuse about ESI would have known that the information sought resided in databases.
  
  My hope is that linked files would be retrieved from the key custodian’s collections and shares–or that the pointers will show up in the data and alert me to what’s missing, even if the responsive targets have to be retrieved on an ad hoc basis.
  
  LikeLike
Pingback: Edox Official Site: eDiscovery Requests On Databases - BLOG
Jim Smith said:

November 22, 2015 at 10:06 PM

Craig, I just want to ensure that you have my current email address. If I need to do anything to ensure I get your emails to my current address, let me know. I am very proud that you are my friend. Look forward to seeing you in NOLA.

LikeLike
Pingback: The Internet of Things Meets the Four Stages of Attorney E-Grief | Ball in your Court
Pingback: Requesting Document Attributes from ECM/CMS Systems in e-Discovery - BeyondRecognition

Share this:

Related

11 thoughts on “Databases in Discovery”