I’m updating my E-Discovery Workbook to begin a new semester at the University of Texas School of Law next week, and I can’t help working in historical tidbits celebrating the antecedents of modern information technology. The following is new material for my discussion of digital encoding: a topic I regard as essential to a good grasp of digital forensics and electronic evidence. When you understand encoding, you understand why varying sources of electronically stored information are more alike than different and why forms of production matter.
We record information every day using 26 symbols called “the alphabet,” abetted by helpful signals called “punctuation.” So, you could say that we write in hexavigesimal (Base26) encoding.
“Binary” or Base2 encoding is notating information using nothing but two symbols: conventionally, the numbers one and zero. It’s often said that “computer data is stored as ones and zeroes;” but that’s a fiction. In fact, binary data is stored physically, electronically, magnetically or optically using mechanisms that permit the detection of two clearly distinguishable “states,” whether manifested as faint voltage potentials (e.g., thumb drives), polar magnetic reversals (e.g., spinning hard drives) or pits on a reflective disc deflecting a laser beam (e.g., DVDs). Ones and zeroes are simply a useful way to notate those states. You could use any two symbols as binary characters, or even two discrete characteristics of the “same” symbol. For now, just ponder how you might record or communicate two “different” characteristics, as by two different shapes, colors, sizes, orientations, markings, etc.
I free you from the trope of ones and zeroes to plumb the evolution of binary communication and explore an obscure coding cul-de-sac called Steganography, from the Greek, meaning “concealed writing.” But first, we need an aside of Bacon.
I mean, of course, lawyer and statesman Sir Francis Bacon (1561-1626). Among his many accomplishments, Bacon conceived a bilateral cipher (a “code” in modern parlance) enabling the hiding of messages omnia per omnia, or “anything by anything.”
Bacon’s cipher used the letters “A” and “B” to denote binary values; but if we use ones and zeros instead, we see the straight line from Bacon’s clever cipher to modern ASCII and Unicode encoding.
As with modern computer encoding, we need multiple binary digits (“bits”) to encode or “stand in for” the letters of the alphabet. Bacon chose the five-bit sets at right:
If we substitute ones and zeroes (right), Bacon’s cipher starts to look uncannily like contemporary binary encodings.
Why five bits and not three or four? The answer lies in binary math (“Oh no! Not MATH!!”). Wait, wait; it won’t hurt. I promise!
If you have one binary digit (21), you have only two unique states (one or zero), so you can only encode two letters, say A and B. If you have two binary digits (22 or 2×2), you can encode four letters, say A, B, C and D. With three binary digits (23 or 2x2x2), you can encode eight letters. Finally, with four binary digits (24 or 2x2x2x2), you can encode just sixteen letters. So, do you see the problem in trying to encode the letters of a 26-letter alphabet? You must use at least five binary digits (25 or 32) unless you are content to forgo ten letters.
Sir Francis Bacon wasn’t especially interested in encoding text as bits. His goal was to hide messages in any medium, permitting a clued-in reader to distinguish between differences lurking in plain sight. Those differences—whatever they might be—serve to denote the “A” or “B” in Bacon’s steganographic technique. For example:
That last one is quite subtle, right? Here’s how it’s done:
To conceal my name in each of the respective examples, every unbolded/unitalicized/serif character signifies an “A” in Bacon’s cipher and every boldface/italicized/sans serif character signifies a “B” (ignore the spaces and punctuation). The bold and italic approaches look wonky and could arouse suspicion, but if the fonts are chosen carefully, the absence of serifs should go unnoticed. Take a closer look to see how it works:
In my examples, I’ve used Bacon’s cipher to hide text within text, but it can as easily hide messages in almost anything. My favorite example is the class photo of World War I cryptographers trained in Aurora, Illinois by famed cryptographers, William and Elizabeth Friedman. Before they headed for France, the newly minted codebreakers lined up for the cameraman; but there’s more going on here than meets the eye.
Taking to heart omnia per omnia, the Friedmans ingenuously encoded Sir Francis Bacon’s maxim “knowledge is power” within the photograph using Bacon’s cipher. The 71 soldiers and their instructors convey the cipher text by facing or looking away from the camera. Those facing denote an “A.” Those looking away denote a “B.” There weren’t quite enough present to encode the entire maxim, so the decoded message actually reads, “KNOWLEDGE IS POWE.” Here’s the decoding:
A closer look:
Isn’t that mind blowing?!?!
Steganography is something most computer forensic examiners study but rarely use in practice. Still, it’s a fascinating discipline with a history reaching back to ancient Greece, where masters tattooed secret messages on servants’ shaved scalps and hit “Send” once the hair grew back. Digital technology brought new and difficult-to-decipher steganographic techniques enabling images, sound and messages to hitch a hidden ride on a wide range of electronic media.