The core challenge of discovery is identifying information that is responsive but not privileged, achieved without undue burden or expense. There are multiple ways to approach the task, none optimal.
The most labor-intensive method is called “linear human review,” where lawyers (for the most part) look at everything and cull responsive and privileged items. It sufficed in the pre-digital era when much effort and resources were devoted to recordkeeping, which insured that information had a “place.” Despite being costly, slow and error prone, linear review was all we had, so became the gold standard for identifying responsive and privileged information.
With the advent of personal computing, the internet and mobile devices, virtually all information today takes digital electronic forms that may be searched electronically. Digitized textual content, whether obtained by applying optical character recognition (OCR) to hard copy or by utilizing native electronic sources, makes it possible to find potentially responsive or privileged material by comparing text strings within documents to search terms expected to coincide with responsive or privileged content. Moreover, digital data always corresponds to a complement of digital metadata, viz. information that describes data’s location, nature and characteristics and that aids in the search, organization, interpretation and use of data.
As data volumes grew, text search and metadata culling became the new touchstones by which information was deemed potentially responsive and potentially privileged, usually as a precursor to manual assessment. Search terms, either by themselves or in logical phrases called Boolean queries, were deployed against the text within each document or more commonly against a concordance index built from extracted text. Items not making the keyword cut for responsiveness tended to be deemed not discoverable and afforded no further consideration. Continue reading








