Though each merit their own post, I’ve lumped two short topics together. The first concerns a modest e-discovery headache, being the cost, friction and static posed by GIF logos in e-mail. The second is a much uglier vulnerability hoppin’ down the bunny trail toward you right now; but rejoice, because you may still have time to avert disaster.
Is that Logo Worth It?
On any given day in my lab (the crowded place I work, hemmed in by LCD screens and hundreds of terabytes of digital storage), I am monitoring the progress of digital evidence being duplicated, ingested, pulled apart, indexed and searched for matters in which I serve as ESI Special Master. As I watch about 500GB of e-mail containers devolving into millions of constituent messages and attachments, I’m struck by the flood of GIF images spilling out of the PSTs.
Just three-fifths of the way through the data, I see 1,371,516 messages have been processed, and these messages have thrown off 1,262,552 GIF images. The great majority of these images will prove to be logos in the signature blocks of the messages, and account for 29% of the item count extracted from the data set so far.
Most of the GIF logos I see in this data are just 2.2KB; so, despite their numerousity, they account for only about ten percent of the volume of single messages (extracted MSGs) sans attachments. “TEN PERCENT,” you protest, “That’s not even a tip!” Their impact is smaller still when attachments are considered, and compression algorithms and deduplication further ameliorate their impact.
I grant that in the hierarchy of e-discovery headaches, no one’s losing sleep over these colorful, corporate vanity plates; but, perhaps their use should be curtailed because they impose hidden costs.
First, they add byte volume and expense in all phases of e-discovery. Greater volume slows collection, demands bigger transfer media and drives up the cost to ingest, process, host and produce messaging. GIF logos slow page refreshes in review tools, waste reviewer time, impede the progress of review, and–by adding to the lag between views–dull mental acuity, making a tedious task even more tedious.
Numerosity carries its own insidious consequences. E-discovery review platforms are databases and, like any database, performance is inversely proportional to the number of discrete items that must be tracked and indexed. Adding millions of useless GIFs to the mix can’t help. Even de-duped and screened from reviewers seeing only extracted text, GIF images must be preserved and tracked, adding overhead.
Another stealthy way that GIF logos complicate e-discovery is by the injection of color imagery into evidence without conveying relevant information. Color in e-discovery is a frequent source of contention. When color is used to convey relevant information, it should be routinely preserved and faithfully reflected in the production. Color coding, highlighting, photographs, emphasis by use of color—there should be no question that these efforts to employ color to convey meaning can’t be degraded in discovery, and either native forms preserving the color component or, for old fashioned static productions, color formats, are obligatory.
But when outmoded approaches are used, segregating colored items for special handling often hinges on an automated color detection process that flags the presence of color in the source evidence. If that flag is triggered by a GIF logo, color detection becomes an unreliable means to distinguish relevant color from vanity usages.
My technically astute vendor colleagues may counter that there are programmatic methods to minimize the static and friction of corporate logos. Indeed; but these, too, aren’t free. Like the stray bit of litter, disposable plastic bag or water bottle, a logo GIF seem pretty innocuous by itself. But, multiply the digital detritus by millions and billions, and the hidden overhead may surprise you. Are those color logos worth it? Perhaps, but just know that your corporate conceit isn’t “free.”
Protecting the E-Discovery Index from Hackers
Though probably only a weirdo like me frets about corpulent GIF logos, every e-discovery lawyer and vendor should worry about the security of the indexes routinely compiled in e-discovery efforts. Well-managed IT environments guard against employees having access to data above their pay grade by use of privileges, multifactor authentication and system segregation. After all, it wouldn’t do if everyone could peruse the boss’ e-mail. If you think otherwise, consider the plight of Sony Pictures Entertainment co-chairman Amy Pascal as her jabs at stars and racial humor about the President come to light from the Sony hack.
One of the challenges faced by hackers is that there is so much stuff worth stealing that it takes considerable time to download it all. Plus, the good stuff is siloed in so many places, it’s a chore to find and fetch it. Wouldn’t hackers love it if all of the really sensitive data could be decrypted and neatly packaged for them in a concise, easy-to-access format to be spirited away simply and in a fraction of the time otherwise required? Better still, wouldn’t hackers be thrilled is we placed that lovely Easter egg where it’s less secure against breach than the sources of ESI it consolidates?
I question whether that’s what we do in e-discovery when we process the extracted text of sensitive e-mail and other decrypted data into unencrypted indices hosted with single factor authenticators or (often worse) held behind law firm firewalls. I’m not the first to raise concerns about lawyers being the weakest link in the cybersecurity chain; but, law firm vulnerability may be compounded by our tendency to consolidate ESI of widely varying sensitivity into a monolithic index that can be stolen more easily and quickly than the same data on the client’s systems.
Is it a concern that can be put to rest? Certainly! We can and will do better; but, only if we see the risk and deal with it. So, know how the index will be protected and treat it like what it is: a more sensitive asset and much more desirable target for cyber snoops.
Matthew Golab said:
Excellent point Craig on the logos. There is also the added pain that they aren’t text searchable and if they make their way into a database for legal review, they are then going to chew up valuable lawyer review time and also they won’t be handled by predictive/auto/computer coding algorithms.
I find for that sometimes its not unusual to have about 80-90 exact duplicates of the logos as well.
Pingback: Top 10 from Texas Bar Today: Wacky Wills, Cat Videos, and Tattoos | Texas Bar Today
Pingback: ESI Observations on a Pretty Good Friday | @ComplexD
Pingback: Email Signature Logos are a Sign That Discovery Will be More Complicated: eDiscovery Best Practices | eDiscoveryDaily
Pingback: Managing Email Signature Logos During Review: eDiscovery Best Practices | eDiscoveryDaily
Christine Payne said:
Another security concern is what the other side might do with data post-production. Even if you have your security all set, is there (or can there be) a way to ensure that opposing counsel has adequate protections in place? Is an agreed protective order all that can be done and how far can you make that go? Also, logos are the worst and should be banned.
As a lawyer, I have never understood why literally thousands of logos make it into review populations *as separate documents* that present as attachments to emails. Why can’t reviewers look at the logos as they appeared in the original email? I have been told by people who “process” the data — ie, get it from collected or produced form into the review platform — that separating embedded images is necessary. I am new to learning about that level of eDiscovery and am embarrassed to say that I don’t understand enough about “processing” to evaluate the claim. Unfortunately, it’s not even as simple as embedded images. Embedded logos often turn into attachments “in the wild,” as anyone probably knows who has tried to view a logo’ed email on a smart phone. So if that’s how the custodian saw the email — with logo attachments — that’s how it should end up in a review population (right?). But there has to be a smarter way to handle them than just having attorneys review them like any other attachment. For example, they are often very tiny files with certain file formats and even file names. What would people think of targeting such attributes to find potential logos and mass tagging them in advance, for example? Then a reviewer could at least move on without extra clicks, though I still shudder at the thousands of extra document loads, views, etc.
Pingback: Three Inefficiency Traps That Drive the Cost of Document Review - VBLSA