About seven years ago, I e-mailed a hypothetical to colleagues seeking their advice about how to process and review an unprepossessing volume of ESI for production in e-discovery. The intelligence they shared became fodder for The EDna Challenge and, I like to think, helped promote lower cost e-discovery options in the marketplace.
Since then, three of the contributors (Browning Marean, Ross Kodner and Dave Kleiman) have passed on and nearly all have moved on. But, the challenge remains challenging. The good news is that Edna now has five times more money to spend.
Next week’s ACEDS Conference seeks a re-examination of EDna options circa 2016 on a bigger budget; so, may I please impose on you, Dear Reader, to share your suggestions drawing on the much broader array of options available today? Please add your suggestions as comments, and it’s fine to toot your own horn so long as you don’t exceed the budget, all in, and meet all the requirements of the challenge.
Here’s the updated challenge:
Your old friend, Edna, called with a question. She has a small law firm. A client is about to send her a Zip file on a thumb drive containing collected ESI in a construction dispute. It will be PSTs for six people, another four MBOX takeouts from Gmail and a mixed bag of word processed documents, spreadsheets, PowerPoint documents, PDFs and “not a lot” of scanned paper documents (sans OCR or load files) for all ten custodians. There may also be some video, photographs and web content. “Nothing too hinky,” she promises. She thinks it will comprise less than 50,000 documents in all, but it could grow to 100,000 items or more. The contents will unzip to about 10-12 GB in all.
She’s determined to conduct a paperless privilege and responsiveness review of the material in-house, sharing the task with an associate and legal assistant. Everyone has a high-end, big screen desktop PC running Windows 8.1 with MS Office 2016 and Adobe Acrobat 11 Pro installed. The office’s network file server has loads of available storage space. She doesn’t own a review tool. She’s willing to spend up to $5,000.00 ALL IN, for software, vendor services, SaaS, whatever, exclusive of the cost of her time and staff time), but she won’t spend a penny more. You can’t loan her your systems or software. You can’t talk her out of it. Pricing must mirror real-world availability, not a special deal.
Edna’s solution must support:
- Efficient workflow
- Robust search
- Ability to process relevant metadata
- Simple document tagging and production identification
- Effective tracked deduplication
- Review may take up to 90 days, and the case may not conclude for up to two years. All review, hosting and production costs must be borne by the budget.
How should Edna spend that $5,000.00?
James O. Moses said:
This is a thing of beauty and a great service.
Iâll be following it.
Robert Hilson said:
Craig, big fan of the Edna challenge, and looking forward to seeing the responses… Just out of curiosity, would be interested to know the following:
1) What is the estimated value of the case?
2) What is Edna’s hourly billing rate? How many hours a week is she billing?
3) Does Edna plan to eat the cost of discovery or pass it through to the client?
1. It’s an all-or-nothing case, with a counterclaim. If the plaintiff got everything sought, it could be $250,000.00 or more. If the case is won on summary judgment, the plaintiff would go hence without day.
2. EDna didn’t volunteer what she charges or the fee arrangement. She is, after all, a friend seeking advice on how to do a task within a specific budget. She didn’t seek guidance on how to bill her clients or whether she will front the cost in a contingent fee arrangement or pass it through to her client. Here again, she only sought advice on e-discovery options.
3. Yes. 😉
Chris Lauer said:
End to end, self-service eDiscovery tool that includes ECA, robust searching and culling capabilities, full data processing, OCR processing and import, custom/tracked deduplication, review platform and full production capabilities.
Bundled cost per GB to include all of the above with the ability to near-line/archive data post review to control hosting fees. Total estimated costs for life of case including 6 months live data hosting, 18 months near-line storage and 4 hours of forensic time to convert files to PST files – $4,600 excluding provider assisted hours (activities that can be performed by end user, but requested of provider).
Andy Wilson said:
Edna, buy a license to Logikcull.com. Got you covered. And you’ll probably be done in 3 days, not 3 months. Your client will LOVE you.
Okay, now that that’s squared away I have a challenge for Edna: instead of billing me $300/hour to read emails for 90 days, or two years as it could be, can you offer a flat rate not to exceed $10k? I’m concerned that $300/hour X 8 hours/day X 60 days will bankrupt my small muffin top business. For $144,000 I could buy a lot more muffin tops to accelerate the growth of my business. Got it? Kthx bye!
You don’t make any case for Logikcull meeting the challenge. Without some breakdown in cost per the parameters of the (very specific) challenge, how can EDna do business with your company? Please flesh it out.
Insofar as the cap on her legal costs, I can’t speak to how she sets up things with her clients. It may be on a contingent fee, value/matter basis, hybrid or straight hourly. EDna’s longtime client keeps coming back to her for representation; so, she must be competitive or otherwise proving her mettle again-and-again. She obviously cares about keeping discovery costs down.
Brad Jenkins (@BJenkins99) said:
Thanks for sharing and we applaud your “EDna” initiative. My company (CloudNine) feels the same way about high-value, low-cost access to eDiscovery tools and we have an offering that provides a capability and cost that may be attractive to EDna.
With our Simplified eDiscovery Automation software, EDna has access to a complete eDiscovery platform for processing, review, and production. She can upload her data for automated native processing, review her data in CloudNine’s integrated review tool, and produce her data in almost any format. Retail costs for EDna with CloudNine, based on 10 – 12 GBs are as follows:
$50 per GB for native processing x 12 GBs = $600
$25 per GB per Mo for hosting x 12 GBs x 3 months during review = $900
$5 per GB per Mo for near ready hosting x 12 GBs x 21 months = $1,260
$25 per GB for self-exports x 12 GBs = $300
$100 per GB for Tiff processing x 12 GBs = $1,200
$200 per hour x 2 hours of tech time to convert MBOX files to pst = $400
Total = $4,660
To support EDna (and all users), we offer unlimited training and support. There are no user fees. We also offer all users (to include EDna) a no-risk free trial (http://www.cloudninediscovery.com/landing/ediscovery-processing/). As our registration and set up are all self-service, users (and EDna) can get started immediately.
Thanks again for your education efforts, and we hope our offering helps EDna achieve her eDiscovery objectives.
Thanks for the helpful breakout. Can’t help but hear Samuel L. Jackson’s voice in Pulp Fiction: “Look at the big brains on Brad!”
Pingback: EDna: Still Cheap and Challenged | @ComplexD
Matthew Golab said:
G’Day from Australia.
That is quite some challenge Craig. I’ll have a go, however without familiarity with the US market, my ideas are probably quite limited. Something that you don’t elaborate on is whether you are doing productions – you mention production identification. Also whether you need to move from native review to say PDF or TIFF (not to agitate you about TIFFs!) or whether you would need the capability to redact.
My initial thoughts are that they purchase DTSearch ($199 is the cheapest option) so that they can index and search and to an extent review with hit highlighting. The issue they would face though is that I don’t think it would be a straight forward exercise in then annotation each doc, and exporting or doing something with those that are responsive – although you don’t say that the scope actually includes production. You could have multiple instances of DTSearch just that you’d select the Network version to have concurrent access to the same indices.
Once they have a dataset that is responsive to the keywords or other filtering, they could also use filelist (by Jam-Software) or MD5Deep (open source commandline) to calculate MD5s, however to do this and join it back to the DTSearch indices would require quite a lot of technical expertise.
My idea being that deduplication is run against the filtered dataset rather than at the start.
As I’m typing this I think it just wouldn’t work.
2) Nuix Proof finder
Proof finder costs $100, and can handle up to 15GB in a single case – although you can have as many ‘up to 15GB’ cases as you want in a 12 month period, and you then simply pay another $100 if you need to continue after the initial 12 months. You can share the nuix case between instances of Proof finder just not concurrently.
Proof finder will be able to handle the extraction, indexing, identification of non-searchable (image based scans, encrypted files, corrupted), deduplication, as well as searching and review.
What I can’t remember though is whether you can annotate in Proof finder with tags, I think you can but I may be wrong (I’m frequently wrong).
Another consideration against Proof finder is that I don’t think you can generate a production in the full blown sense.
3) Pinpoint harvester
This tool can achieve a similar result to Nuix Proof finder however I don’t know about the annotating capabilities as its really designed for collection and legal holds and then deduplcation and filtering and you then pass it on to another tool to conduct further processing and review. I also don’t know the on the street retail price of harvester.
4) Microsoft Sharepoint and its ediscovery capabilities – or Office 365 and the cloud based ediscovery equivalent
This option is way out there and probably won’t feasible. Assuming that Edna has an in-house IT team with Sharepoint capability, then I think it would be possible to bring the data into Sharepoint and then use the ediscovery capabilities to search and possibly annotate.
This won’t handle deduplication though and I think similar to the short comings of DTSearch would require a lot of specific expertise to make it work for deduplication and other nitty gritty things.
I don’t know the cost though of this so it probably wouldn’t be a viable alternative.
Similar in concept to Nuix Proof finder, I think Intella could meet the criteria, what I don’t know though is whether the $99 PI model can annotate and do productions.
In conclusion, I think option then 2) then option 5) would meet the criteria.
The $5K cap is the hardest part, and also my relative ignorance of the other options in the US market.
Peter Mercer said:
Intella P.I. can export to Concordance, Summation, Ringtail and Relativity. It can also redact and has options for OCR.
John Martin said:
Lower cost ediscovery should be extended to lower cost representation in the scope of overall case cost to the client. The challenge should be holistic from that standpoint. After all, Edna is a “vendor” too and as Andy correctly points out – the vast majority of the cost.
The technology shouldn’t be limited to a ceiling, but rather valued based on the time (therefore cost) reduction in lawyer time. The tech is worth as much, if not more than the legal time so long as using it reduces the overall time and spend by the client.
Its basic “cost vs price”.
And it should rain lemonade and lollipops, and people should smile and exercise more. No doubt your friend EDna agrees; but, she doesn’t have time to hear you tell her how she is doing everything all wrong. She asked a specific question of her tech-savvy friend and won’t benefit from the deflection to what her question should have been. She seeks a response to the question she posed rather than a recasting of same. Ceilings on expenditures are unfortunate facts of life in a budget-centric world.
David Golden said:
What is most intriguing is how a zip file of potentially responsive documents suddenly appears from heaven. We’re looking at the need to perform a defensible collection from 10 custodians of email and electronic documents that spans both PCs and the cloud. What percent of lawyers have the expertise to do this? Outsourcing this step alone is likely to consume the entire $5K budget. Perhaps the topic of affordable preservation and collection merits its own EDNA challenge.
Indeed, preservation and collection could merit a different challenge. This is not that challenge. The collection was not heaven-sent, but came from the client. Perhaps it came from in-house IT or from outside IT. Perhaps it proves defensible; perhaps not. It is, however, data that EDna must review; and, she cannot begin to assess its character and completeness until and unless she can get eyes on the data. Your friend, EDna, didn’t call to ask what you thought about the adequacy of her client’s collection ability or strategy; and, you were too courteous to offer unsolicited advice on same.
Chris Dix said:
Was the collection by the client a “targeted” collection, where some basic filtering and culling has already been performed, or are the PST and MBOX files just copies of entire mailboxes for the custodians?
Although EDna hasn’t seen the data supplied, because she hasn’t a review mechanism (hence the challenge); her instructions to her client were such that she expects the data to be unfiltered save for discrimination by custodian and source. Still, she doesn’t understand why your recommendation would change dependant upon whether the collected data was targeted or not.
richard Stieghorst said:
I unfortunately no longer have a solution to EDna’s needs. However, of note that seems to be missing in the technology that EDna has available to her is her internet speed. A 4Mbps up/down is limiting in transferring data both as the items she needs to review, and a possible on-line solution. As a vendor who formerly offered an on-line solution (we just do data collection and computer forensics now), it was always a first line question I would ask. Nothing worse than a client being frustrated with your review tool, only to find that it is their own system slowing them down. Just a note on that. Thanks for continuing to do what you do Craig!
GREG BUCKLES said:
Glad to see you update the EDna Challenge! I took your new parameters and plugged them into a recent RFI/RFP for a low volume proactive ECA purchase engagement that went out to 19 providers. Wrote up the rough analysis in a blog. Bottom line is that 3 months hosting keeps it affordable with more than 70% of my respondents.
Jon Kerry-Tyerman said:
Craig, thanks for the challenge!
Everlaw’s cloud-based platform can provide everything that EDna requests, and more, far under her budget. For a flat fee of $450/month, Everlaw includes 20GB of cloud hosting for a single matter, unlimited users, unlimited processing (including all imaging and OCR), unlimited production into any standard protocol, unlimited training and support, and unlimited features (visual search, workflow automation, email threading, deduplication and near-dupe detection, predictive coding, case-building tools, foreign-language detection and translation, etc.; more details here: http://everlaw.com/features/). For the first 90 days, this totals $1,350.
If she’d like to suspend the case into a near-line state for the remaining 21 months, EDna would pay $450 for the first year and $45/month thereafter (9 months), for a total of $855.
The grand total would be $2,205, less than half of her budget!
Peter Mercer said:
Solution 1. I would suggest Intella P.I. $99 and our 1/2 day training course $595 on Intella. To me undertaking training on the software is a must. Trying to lean on the job while under a deadline is crazy. After the training course Edna’s team would be set to begin the case.
Solution 2. Intella 100 @$2995 and 2 X Intella Viewer @$595 (Viewer allows the others in the office to also review the case). 1 X Intella Training. @ $595. This would offer room for the case to grow over the 2 years and is a perpetual license.
Both Intella P.I. and Intella 100 can be used on other cases.
Again want to emphasize how important having training on the software is (regardless of what software). A small amount of training in advance can save many tears.
Thanks for the emphasis on a training component, Peter, and for addressing the need for multiple simultaneous reviewers. Solution 2 also takes into account the potential that the dataset expands to 12GB, considering that Intella P.I. is capped at 10GB. Intella’s ability to generate a standard load file is a significant differentiator at this price point for a standalone application.
Richard Clark said:
I appreciate how these exercises are a pulse on the market. For years, this has been a large challenge in the industry as many platforms are built to address large data concerns, but the tens of thousands of small cases suffer as they don’t “scaled down” very well economically. We built CVFox to address these case types.
The simplified workflow that CVFox will allow Edna to process and manage this project is as follows through a locally hosted SaaS platform:
Drag and drop data intake, CVFox will unpack the Zip file, process the nested data and host in our Tier 3 data center. Processing specs are configurable, but we will de-duplicate/de-NIST as needed, create the db (Postgres), OCR, index in both Elastic Search and CAAT™, image, and make available with RBA logins. CVFox can process all data presented through the automated process without any additional charges and all duplicates are tracked and easily managed.
For the investigation and search, there will be data analytics including concept clustering, near duplicate, email threading and traditional key word search however she would like. Edna’s team will quickly be able to cluster similar documents, use dashboards, perform key word searches and organize key data important to the construction claims. The data can get organized with labels and review sets created, document routing workflows and simplified customizable coding templates. Once the review is completed Edna can run productions easily and securely download them to her firm.
The charges are as follows to include full access to all options in CVFox:
$200/GB of the non-expanded data volumes* with unlimited access for six months. Non-expanded means it is the total native data sizes within the Zip file(s) regardless of the expansions on our servers). I am assuming that the 12GB’s is the native data volumes so the total would be $2,400.
After six months, we charge 7.5% of the original invoice per month ($180), but since this case will be completed in 90 days, the case can become inactive at 3.75% ($90/mo). The total would be $1,620 for 18 months. As they need to reactivate the case, the charge would simply be $180 for that month. The Inactive Total would be $1,620.
*Should the assumption of the 12GB’s be based off of the fully expanded data volumes, we would be much lower as we only charge on the native data volumes for easy predictability.
I trust this information is helpful in Edna’s evaluations of the current legal solutions available on the market.
Rob Robinson (@ComplexD) said:
On #Periscope￼: Video Recording of the ACEDS 2016 EDna Challenge Session with Craig Ball, Tania Mabrey, and Tom O’Connor. https://www.periscope.tv/w/1OdJrQnAzBAGX
Pingback: Highlights from the ACEDS 2016 E-Discovery Conference | Clustify Blog – eDiscovery, Document Clustering, Predictive Coding, Information Retrieval, and Software Development
Pingback: Whip Me, Beat Me, Call Me Edna: eDiscovery Trends | eDiscoveryDaily
Pingback: The EDna Challenge - Part Two (ACEDS Extract) | @ComplexD
Pingback: The Benefits of Blogging: eDiscovery Trends | eDiscoveryDaily
Anith Mathai said:
Thanks for the update on the EDna challenge. We actually met a few real-life EDnas about a year ago, which inspired us to work on GoldFynch (www.goldfynch.com), a cloud based e-discovery service, targeted at small and solo practitioners.
To answer this updated challenge, EDna would have to proceed to the GoldFynch website and sign up for an account. Then add the “Nestling” case to her account, since her unzipped file size is within the 15GB limit.
The cost of the “Nestling” case is a flat fee of $50 / month. This fee is all inclusive, including processing, format conversions, OCR, productions, multiple users, and support. For 90 days, her total bill would be $150. If the case was active for 2 years, her bill would be $1200. Case archival isn’t implemented yet, but it is planned, and could reduce EDna’s cost during the inactive part of the 2 years.
As far as workflow goes, it is pretty straight forward for EDna. She has to drag and drop the zip file into her GoldFynch case, at which point the data will immediatly start uploading and processing. Our platform will take care of unzipping, file type detection, file conversion / imaging, indexing, and any OCR that might be needed. Depending on the number of files that needed OCR, which is usually the processing bottleneck, this process could take a few hours. But EDna can start reviewing processed files immediately. EDna can search and filter through the files as she wishes, or go file-by-file through the whole set. GoldFynch supports tagging, coding, specialized date searching, as well as “key people, places, things” extraction, which uses language processing to give a quick overview of important words and phrases.
Once complete, EDna can produce the documents in native formats (for say, the spreadsheets), PDF or a combination of both. She can also apply redactions to files and Bates stamps as part of the production process.
As Richard Stieghorst pointed out, upload speed would be an issue. Hopefully she has a fast enough upload connection. If not, it would take 3 hours for 12GB to upload on a 10 Mbps upload line.
We also built GoldFynch by observing lawyers and paralegals use it and have designed it to be very intuitive. The quick start manual and online training videos should be sufficient. If not, she can contact us with specific questions. Our cloud monitoring and alert system will notify us immediately of any processing or other errors, so we can be proactive and ensure things are running smoothly. Users have a direct line to us, to contact us at anytime . We often implement features and file format support due to direct user requests, sometimes in as little as a few days.
One caveat, if EDna has health information or data under HIPAA, there would be a marginally extra charge on our system, but still within the $5000 limit.
Excellent! Could you please address the TIFF and load file generation capabilities. I’m not a fan of dumbing down the data; but, there are still many Luddites out there who demand such things using a variety of largely specious arguments. “E-discovery for Everybody” is the watchword, even the dinosaurs.
Anith Mathai said:
With regards to TIFF and load file generation- So far, our user base has been happy with just the PDF and native productions (and mix of the two), but we know that we must eventually implement the common images + text files + load file formats, for increased acceptance. We have these scheduled a couple months down the line… hopefully in time for EDna to use them for for her productions if needed. However if it is an urgent need for EDna, we could always move it up the feature release list.
As EDna is not a real person, she feels constrained not to demand same as an urgent need. It is, unfortunately, a real world need that must be met for the pitiably large cadre of persons out there still laboring under the yoke of TIFF+. However, even the imaginary EDna feels strongly that the ability to generate compliant load files is not an option that can be deferred, even for PDF and native productions. How do you deliver unaltered system metadata values without employing a load file?
Hi Craig, we are working on adding automated generation of load files into GoldFynch. For the moment we can always generate one manually for EDna, at no extra charge.
Pingback: Why Your Small Firm Needs E-Discovery, and Where to Get It. | Goldfynch Blog
Pingback: An eDiscovery Challenge: Pricing Consistency and Transparency | @ComplexD
Pingback: PART 2: Predictive coding faces the dreaded S curve, e-discovery goes back home .. and the cloud becomes SO important – Gregory Bufithis
Pingback: The Small Case Dilemma: EDiscovery for the Rest of Us | Techno Gumbo
Pingback: Summary and Analysis: Microsoft Office 365 eDiscovery Challenge Survey | LawSites
Pingback: Summary and Analysis: Microsoft Office 365 eDiscovery Challenge Survey – Kenneth Joyner – Blog