• Home
  • About
  • CRAIGBALL.COM
  • Disclaimer
  • Log In

Ball in your Court

~ Musings on e-discovery & forensics.

Ball in your Court

Category Archives: Computer Forensics

Query the Quintessential Quintet

20 Monday Jan 2014

Posted by craigball in Computer Forensics, E-Discovery, General Technology Posts

≈ 1 Comment

fab5judges

On Wednesday, February 5, 2014 at 9:00am, I’m moderating a plenary session at LegalTech New York where the panelists are a veritable Mount Olympus of e-discovery leaders from the federal bench: John Facciola, James Francis, Andrew Peck, Lee Rosenthal and Shira Scheindlin.  I can hardly imagine a more quintessential quintet of rare knowledge and eloquence!  Kudos to ALM educational coordinator, Judy Kelly, for deftly getting them all to commit.

The judges will be discussing some of what you might expect, e.g., proposed Rules amendments, predictive coding, Rule 502 and expectations of lawyer technical competence.  We will also be exploring a few fresh issues, like the impact all those little screens are having on everyone in and out of court.

There’s still time to add topics and questions of interest to you to the program; so, if you have questions you’d pose or topics you’d explore, please share them here as a comment (or e-mail them to me: craig at ball dot net), and I’ll try to work them in.  Hope to see you in New York!

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Print (Opens in new window) Print
  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...

Thanks. Can You Do Me a Favor Please?

19 Sunday Jan 2014

Posted by craigball in Computer Forensics, E-Discovery, General Technology Posts, Personal, Uncategorized

≈ 5 Comments

Sorry to take your time asking for help. so I’ll be quick about it.

But first, thank you.  Thanks to you, dear reader, this blog and its 85 posts reached 100,000 views a few days ago.  That’s nothing compared to the millions of page views others see, but it’s very gratifying to me because I launched this blog without saying a word to anyone.  Somehow, you just found it.  Ball in Your Court is an outlet born of frustration with the two-month publication lag attendant to my former print column and the sudden shuttering of an American Lawyer Media blog where I’d previously posted.  I wanted a place where no one could pull the plug but you or me.  This blog is a very personal connection to you.

The favor I ask is this:  if you like the content here or find it of some value, please share it with someone you think might be interested.  If you have a blog or site with a blogroll, please consider adding Ball in Your Court to your blogroll.  I will try to earn my place on your page and in your day.  Thanks.

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Print (Opens in new window) Print
  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...

Revisiting ‘How Many Documents in a Gigabyte?’

15 Wednesday Jan 2014

Posted by craigball in Computer Forensics, E-Discovery

≈ 6 Comments

equalI once wrote a column titled “Page Equivalency and Other Fables.”  It lambasted lawyers who larded their burden arguments with bogus page equivalencies like, “everyone knows a gigabyte of data equates to a pile of printed pages that would reach from Uranus to Earth.”  We still see wacky page equivalencies, and “from Uranus” still aptly describes their provenance.

Back in 2007, I wrote, “It’s comforting to quantify electronically stored information as some number of pieces of paper or bankers’ boxes.  Paper and lawyers are old friends.  But you can’t reliably equate a volume of data with a number of pages unless you know the composition of the data.  Even then, it’s a leap of faith.”

So, I’m happy to point you to some notable work by my friend, John Tredennick.  I’ve known John since the emerging technology was fire and watched with awe and admiration as John transitioned from old-school trial lawyer to visionary forensic technology entrepreneur running e-discovery service provider, Catalyst.  John is as close to a Renaissance man as anyone I know in e-discovery, and when John speaks, I listen.

Lately, John Tredennick shared some revealing metrics on the Catalyst blog looking at the relationship between data and document volumes, an update to his 2011 article called, How Many Documents in a Gigabyte?  John again examines document volumes seen in the data that Catalyst receives and processes for its customers and, crucially, parses the data by file type.  As the results bear out, the forms of the data still make an enormous difference in terms of data volume.  Even as between documents we think of as being “the same” (like Word .doc and .docx formats), the differences are striking.

For example, John’s data suggests that there are almost 60% more documents in a gigabyte of Word files in the .docx format (7,085) than in a gigabyte of files stored in the predecessor .doc format (4,472).  This makes sense because the newer .docx format incorporates zip compression, and text is highly compressible data.

[One exercise I require of the law students in my E-discovery class is to look at the file header of a Word .docx file to note its binary signature, PK, characteristic of a zip-compressed file and short for Phil Katz, author of the zip compression algorithm.  For grins, you can change the file extension of a .docx file to .zip and open it to see what a Word document really looks like under the hood.  Hint: it’s in XML].

John reports a similar discrepancy between new and old Excel spreadsheet formats (1,883 .xlsx files per gigabyte versus 1,307 for .xls).  Here again, the .xlsx format builds in zip compression.

But, the results are reversed when it comes to PowerPoint presentations, with John finding that there are marginally fewer of the newer .pptx files in a gigabyte (505) than the older .ppt format files (580).  This makes sense to me because Microsoft phased out the .doc format ten years ago.  Since then, presenters have gotten better about adding visual enhancements to deadly-dull PowerPoints, and they tend to add ‘fatter’ components like video clips.  The biggest factor is that pictures are highly incompressible, and common image formats (i.e., .jpg images) have always been compressed.  Compressing data that’s already compressed tends to increase, not decrease its size.

Wisely, John speaks only of document volumes and makes no effort to project page equivalencies, not even by extrapolating some postulated ‘average-pages-per-file type.’  Anything like that would be as insupportable today as it was when I wrote about it in 2007.  Also, when you look at John’s post, note that there is no data supplied concerning TIFF images.  I’m not sure why, but I can promise you this: TIFF images are MUCH fatter files, costing far more in terms of storage space and ingestion costs than their native counterparts.  Had John added TIFF to the mix, I’m confident his weighted averages would have been much different…and far less useful–much like TIFF images as a form of production. 😉

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Print (Opens in new window) Print
  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...

Transparency of Process No Peril to Work Product

16 Monday Dec 2013

Posted by craigball in Computer Forensics, E-Discovery, Uncategorized

≈ 13 Comments

I’m rarely moved to criticize the work of other commentators because, even when I don’t share their views, I applaud the airing of the issues their efforts bring.  But sometimes a proposition is just so blatantly ill-advised, so prone to unfairly tilt the litigation playing field, that any reader and every writer should stop and say, “Wait a second….”  One such article, currently running in the New York Law Journal and called No Disclosure: Why Search Terms Are Worthy of Court’s Protection, charges that judges who require disclosure of search terms “discount or misunderstand” what the authors term the “protected nature of key aspects of the e-discovery process,” namely filtering of data by use of search terms.  The authors think that disclosure of search terms used to exclude data from disclosure compromises the work product privilege and argue that judges should “recognize that a search term is more than a collection of words, rather, the culmination of an attorney’s interaction with the facts of the case.”

Espousing the sanctity of work product privilege to an audience of litigators is like saying, “I support our troops.”  It’s mom, baseball and apple pie.  It’s also popular to paint judges as addled abusers of discretion.  But let’s not let jingoism displace judgment.  Search terms are precisely what the authors claim they are not: search terms are a collection of words.  They are lexical filters.  Nothing more.

Search terms deserve no more protection from disclosure than date ranges, file types and other mechanical means employed to exclude data from scrutiny.  Search terms strip out information that will never see the light of day nor benefit from the application of lawyer judgment as to their relevance.  In that sense, search terms are anathema to the core principles of work product and warrant more, not less, scrutiny. Continue reading →

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Print (Opens in new window) Print
  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...

It’s the Parties’ Data, Stupid!

03 Tuesday Dec 2013

Posted by craigball in Computer Forensics, E-Discovery

≈ 4 Comments

wrongendAs the curtain comes down on 2013, I’m reflecting on where the weeks went.  This was the year of fights about forms; months spent endeavoring to persuade courts, opponents (and even my clients) that lawyers and judges have been peering into the wrong end of the telescope when it comes to forms of production. We must stop focusing on the feeble forms lawyers use for review, and concentrate on the robust forms that parties use for everything else.

In discovery and disclosure we seek information from parties and third-parties.  We want the data used and created by, for and about parties and third-parties relating to the actions they took or didn’t take.  We don’t pursue discovery/disclosure against the lawyers in the case.  If we tried, our efforts would be confounded by claims of attorney-client privilege and attorney work product.  Apart from pro se lawyers with fools for clients, attorneys aren’t parties, and attorneys aren’t witnesses.  The forms your opposing counsel uses for review shouldn’t matter.  Discovery and disclosure is party-centric, not attorney-centric.

Ask parties about the forms of ESI they use daily and it’s doubtful you’ll hear a peep about TIFF images or load files.  Parties don’t use that junk; only Luddite lawyers do.  Clients use spreadsheet programs, word processors, mail and messaging applications and databases, to name a few.  When they create, communicate and collaborate, they do it using forms geared to native applications with file extensions like .XLSX, .DOCX, .PPTX, .MSG, etc.  They choose and use functional and complete native and near-native forms.  Those are the forms witnesses consult to reconstruct events and refresh their memories.  Those are the forms witnesses recognize at deposition and in trial. Continue reading →

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Print (Opens in new window) Print
  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...

Collecting Gmail for Preservation

29 Tuesday Oct 2013

Posted by craigball in Computer Forensics, E-Discovery

≈ 15 Comments

I’m surprised how frequently I’m engaged to collect the contents of Gmail accounts in e-discovery, especially when the account is being collected solely for preservation, and there’s no compelling reason to entrust the task to a neutral.  I appreciate that hiring an expert offers greater assurance that the task will be approached with skill and experience, as well as that integrity of process can be supported by the testimony of someone unconnected with the client or law firm.  But, though collecting and validating the complete contents of a Gmail account can be tricky and tedious, it’s not all that difficult to do.  Happily, unless you do something really dumb, it’s unlikely that even a botched Gmail collection effort will harm the contents of the account.

For those seeking a low-cost, defensible mechanism to preserve Gmail content, this (long, dry) post lays out a detailed methodology for collection and preservation of the contents of a Gmail webmail account in the static form of a standard Outlook PST container file.  I will address various technical considerations, but few legal ones.  Whether or not the methods described in this post are legally sufficient in your case or compliant with Gmail’s terms of service is not my call, and I offer no opinions about same.

[NOTE TO READERS 10/14/14: When I wrote this post, there was not yet a backup capability built into Gmail.  Google  now makes data tools available that support the creation of a rich archive of a user’s Google content, including, Gmail, Contacts, Calendar and Google Drive.  You can find it the Archive section of https://www.google.com/settings/datatools when logged into Google and can read more about it here.]

Continue reading →

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Print (Opens in new window) Print
  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...

4 Sale: Fixer Upper in Potemkin Village

13 Friday Sep 2013

Posted by craigball in Computer Forensics, E-Discovery, General Technology Posts

≈ 5 Comments

GP-Cell

This morning, as I so often do, I met with some nice folks touting a new e-discovery product.  As we talked, I couldn’t help but recall Lover Come Back, a goofy Mad Men-era flick about an ad executive who mounts a glitzy campaign for a product that doesn’t exist.   The movie starred Rock Hudson, Doris Day and Tony Randall, and was fun; the product briefing less so.

Without offering sufficient detail to identify the product, let me say that it’s one of those that come on the scene before every ILTA or LegalTech, with catchy names, slick brochures and ambitious development timelines.  These upstarts claim to offer groundbreaking features and pricing that always turn out to be much the same groundbreaking features and pricing offered by last year’s new kid on the block.  Names we recognize from other products and vendors attach themselves to these ventures, and it all seems like an honest-to-goodness business save for one teeny tiny wrinkle: the promised product doesn’t exist.

Behind the scenes of this powerful end-to-end dynamo are people using a competitor’s tool and painstakingly positioning the output so that it seems like the product really delivers.  It’s not meant to deceive because beneath the marketing lies a heartfelt intent to build the product as soon as enough people commit to buy it and cash begins to flow.   In this field of dreams, if they come, we will build it.

I don’t know.  Maybe this is how great products are born nowadays.  Perhaps it’s all about hype, and it doesn’t matter if the product follows the deal or the deal follows the product.  But, I don’t think a product pitch should recall Empress Catherine II admiring the false fronts of Disneyesque villages erected by her lover, Potemkin, or of late, the photos of thriving businesses placed in vacant storefronts to downplay economic doldrums to those attending the 2013 G8 Summit in Enniskillen, Northern Ireland.

Vendors: I like to look at your products, I really do.  I ask this of you in return.  If you are going to show me something, it should exist now, not “maybe in the next release.”  If you claim your product can do something, it should be able to do it, and not only in a contrived demo against a handful of sanitized Enron documents.  Your pricing should be clear and reflect real world experience, not the costs paid by those who don’t need you to actually do anything.  And if you can’t direct me to a satisfied customer who regularly uses your product, don’t tell me it’s because you’re guarding client confidentiality.  Instead, please change my litter, fill my water bottle and put pellets in my dish, so I can get back to being a guinea pig.

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Print (Opens in new window) Print
  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...

Come and Take It: Free Corpus to Test E-Discovery Tools

28 Sunday Jul 2013

Posted by craigball in Computer Forensics, E-Discovery

≈ 6 Comments

comeandtakeitI just returned from Santa Fe where I spoke on a panel with Judges Paul Grimm and Rebecca Pallmeyer at the always excellent ALI Current Developments in Employment Law program.  I opened our sessions with a presentation I call “Spoiled and Deluded: The Shakespearean Tragedy of Search in E-Discovery.”  The presentation addresses the discontinuity between what lawyers believe their search tools can accomplish and the practical limits of same.

While I was explaining the role of stop words in indexed search and lamenting what I call the “to be or not to be” problem” (i.e., the inability of some text indexing tools to find that most famous of English language phrases because its constituent words are often omitted by text parsers), Judge Pallmeyer stopped me and said, “Is that true?”

When a federal district judge pointedly asks you if what you are telling the audience is true, it’s an opportune time to catch your breath and collect your thoughts before responding.

“Yes, Judge,” I answered, “It’s true.”  

She countered, incredulously, “But surely I can find ‘to be or not to be’ if I put it in quotes, right?”

“No, Your Honor,” I replied.  “If it’s been excluded from the index, no search will find what’s not there to be found.” Continue reading →

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Print (Opens in new window) Print
  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...

Proof Finder Hits Philanthropic Goal

25 Tuesday Jun 2013

Posted by craigball in Computer Forensics, E-Discovery, General Technology Posts

≈ Comments Off on Proof Finder Hits Philanthropic Goal

unicef3When I was a boy, in that innocent time before poisoned Pixy Stix, Halloween was magical.  We planned our costumes for months and mapped routes to maximize candy yields.  But it wasn’t all Batman and Casper and treats.  We also turned our milk cartons into piggy banks and cried “Trick or Treat for UNICEF” at every door  A few pennies collected with Chuckles and Charms bought a month’s worth of milk for a hungry child.  Then as now, so little could do so much to aid needy children a world away.  I’m reminded of that as I share the wonderful news that Nuix has reached its goal to raise $100,000 for charity by selling licenses for Proof Finder.

My friend Eddie Sheehy, CEO of Nuix, announced today that, “To date, Proof Finder sales have helped Room to Read and local communities build schools in Nepal and Sri Lanka, publish local-language school books and provide support for 30 girls to complete secondary education. With the funds raised since March 2013, Room to Read will establish two libraries in Delhi, India and provide a full year of secondary school education for 20 girls in India.” Continue reading →

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Print (Opens in new window) Print
  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...

The Real Voyage of E-Discovery

25 Saturday May 2013

Posted by craigball in Computer Forensics, E-Discovery

≈ 1 Comment

The real voyage of discovery consists not in seeking new landscapes, but in having new eyes. – Marcel Proust

eye

E-discovery education is lawyers and judges teaching lawyers and judges the law of discovery, but little of the “e.”  This closed loop is unhealthy because it reinforces the misperception that understanding what makes digital different doesn’t matter.

But, of course it does.  

It’s human nature to set the standards for competence so that you meet them. No one wants to define themselves out of a job.  As a result, the trial bar keeps telling itself that grasping the bits and bytes of information technology is someone else’s problem…or not a problem.  “The top lawyers and judges out there don’t know that stuff, so it can’t be something a lawyer or judge needs to know.”  That’s the view through old eyes.

I dump on lawyers for ducking the obligation to to be competent in a world teeming with electronic evidence.  But I recognize that even the brave souls that try to cultivate new eyes for digital evidence are confounded by the paucity of e-discovery instruction affording equal stature to the technology.  Where do lawyers learn the very thing that makes e-discovery so daunting for them?  Where do they learn it in the unique context of trial practice and put their newfound skills into practice?

Right now, there’s probably only one answer to those questions: the Georgetown E-Discovery Training Academy, a weeklong program offered in early June, with the next Academy starting on June 2nd. Continue reading →

Share this:

  • Email a link to a friend (Opens in new window) Email
  • Print (Opens in new window) Print
  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
Like Loading...
← Older posts
Newer posts →
Follow Ball in your Court on WordPress.com

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 2,236 other subscribers

Recent Posts

  • The Most Important Thing I’ve Read This Year February 12, 2026
  • 2026 Guide to AI and LLMs in Trial Practice January 9, 2026
  • A Master Table of Truth November 4, 2025
  • Kaylee Walstad, 1962-2025 August 19, 2025
  • Native or Not? Rethinking Public E-Mail Corpora for E-Discovery (Redux, 2013→2025) August 16, 2025

Archives

RSS Feed RSS - Posts

CRAIGBALL.COM

Helping lawyers master technology

Categories

EDD Blogroll

  • Illuminating eDiscovery (Lighthouse)
  • E-Discovery Law Alert (Gibbons)
  • The Relativity Blog
  • Minerva 26 (Kelly Twigger)
  • eDiscovery Today (Doug Austin)
  • eDiscovery Journal (Greg Buckles)
  • Complex Discovery (Rob Robinson)
  • GLTC (Tom O'Connor)
  • Basics of E-Discovery (Exterro)
  • Sedona Conference
  • CS DISCO Blog
  • E-D Team (Ralph Losey)
  • Corporate E-Discovery Blog (Zapproved )

Admin

  • Create account
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com

Enter your email address to follow Ball in Your Court and receive notifications of new posts by email.

Website Powered by WordPress.com.

  • Subscribe Subscribed
    • Ball in your Court
    • Join 2,088 other subscribers
    • Already have a WordPress.com account? Log in now.
    • Ball in your Court
    • Subscribe Subscribed
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...
 

    %d