demonbunny2Though each merit their own post, I’ve lumped two short topics TexasBarToday_TopTen_Badge_Smalltogether.  The first concerns a modest e-discovery headache, being the cost, friction and static posed by GIF logos in e-mail. The second is a much uglier vulnerability hoppin’ down the bunny trail toward you right now; but rejoice, because you may still have time to avert disaster. 

Is that Logo Worth It?
On any given day in my lab (the crowded place I work, hemmed in by LCD screens and hundreds of terabytes of digital storage), I am monitoring the progress of digital evidence being duplicated, ingested, pulled apart, indexed and searched for matters in which I serve as ESI Special Master.  As I watch about 500GB of e-mail containers devolving into millions of constituent messages and attachments, I’m struck by the flood of GIF images spilling out of the PSTs.

Just three-fifths of the way through the data, I see 1,371,516 messages have been processed, and these messages have thrown off 1,262,552 GIF images.  The great majority of these images will prove to be logos in the signature blocks of the messages, and account for 29% of the item count extracted from the data set so far.

Most of the GIF logos I see in this data are just 2.2KB; so, despite their numerousity, they account for only about ten percent of the volume of single messages (extracted MSGs) sans attachments.  “TEN PERCENT,” you protest, “That’s not even a tip!”  Their impact is smaller still when attachments are considered, and compression algorithms and deduplication further ameliorate their impact.

I grant that in the hierarchy of e-discovery headaches, no one’s losing sleep over these colorful, corporate vanity plates; but, perhaps their use should be curtailed because they impose hidden costs.

First, they add byte volume and expense in all phases of e-discovery.  Greater volume slows collection, demands bigger transfer media and drives up the cost to ingest, process, host and produce messaging.  GIF logos slow page refreshes in review tools, waste reviewer time, impede the progress of review, and–by adding to the lag between views–dull mental acuity, making a tedious task even more tedious.

Numerosity carries its own insidious consequences.  E-discovery review platforms are databases and, like any database, performance is inversely proportional to the number of discrete items that must be tracked and indexed.   Adding millions of useless GIFs to the mix can’t help.  Even de-duped and screened from reviewers seeing only extracted text, GIF images must be preserved and tracked, adding overhead.

Another stealthy way that GIF logos complicate e-discovery is by the injection of color imagery into evidence without conveying relevant information.  Color in e-discovery is a frequent source of contention.  When color is used to convey relevant information, it should be routinely preserved and faithfully reflected in the production.  Color coding, highlighting, photographs, emphasis by use of color—there should be no question that these efforts to employ color to convey meaning can’t be degraded in discovery, and either native forms preserving the color component or, for old fashioned static productions, color formats, are obligatory.

But when outmoded approaches are used, segregating colored items for special handling often hinges on an automated color detection process that flags the presence of color in the source evidence.  If that flag is triggered by a GIF logo, color detection becomes an unreliable means to distinguish relevant color from vanity usages.

My technically astute vendor colleagues may counter that there are programmatic methods to minimize the static and friction of corporate logos.  Indeed; but these, too, aren’t free.  Like the stray bit of litter, disposable plastic bag or water bottle, a logo GIF seem pretty innocuous by itself.  But, multiply the digital detritus by millions and billions, and the hidden overhead may surprise you.  Are those color logos worth it?  Perhaps, but just know that your corporate conceit isn’t “free.”

Protecting the E-Discovery Index from Hackers
Though probably only a weirdo like me frets about corpulent GIF logos, every e-discovery lawyer and vendor should worry about the security of the indexes routinely compiled in e-discovery efforts.  Well-managed IT environments guard against employees having access to data above their pay grade by use of privileges, multifactor authentication and system segregation.  After all, it wouldn’t do if everyone could peruse the boss’ e-mail.  If you think otherwise, consider the plight of Sony Pictures Entertainment co-chairman Amy Pascal as her jabs at stars and racial humor about the President come to light from the Sony hack.

One of the challenges faced by hackers is that there is so much stuff worth stealing that it takes considerable time to download it all.  Plus, the good stuff is siloed in so many places, it’s a chore to find and fetch it.  Wouldn’t hackers love it if all of the really sensitive data could be decrypted and neatly packaged for them in a concise, easy-to-access format to be spirited away simply and in a fraction of the time otherwise required?  Better still, wouldn’t hackers be thrilled is we placed that lovely Easter egg where it’s less secure against breach than the sources of ESI it consolidates?

I question whether that’s what we do in e-discovery when we process the extracted text of sensitive e-mail and other decrypted data into unencrypted indices hosted with single factor authenticators or (often worse) held behind law firm firewalls.  I’m not the first to raise concerns about lawyers being the weakest link in the cybersecurity chain; but, law firm vulnerability may be compounded by our tendency to consolidate ESI of widely varying sensitivity into a monolithic index that can be stolen more easily and quickly than the same data on the client’s systems.

Is it a concern that can be put to rest?  Certainly!  We can and will do better; but, only if we see the risk and deal with it.  So, know how the index will be protected and treat it like what it is: a more sensitive asset and much more desirable target for cyber snoops.