The Latin maxim Docendo Discimus means “by teaching, we learn.” So true, because absent my need to stay up-to-date to teach, it’s easy to fall behind. I teach various places, but of longest standing at the University of Texas School of Law, my alma mater. My subject is E-Discovery and Digital Evidence, a three-credit, 14-week course. In my course, information technology enjoys equal status with case law and procedure. Half the semester is dedicated to mastering the “e” in e-discovery: the foundations of modern information storage and retrieval. That balance is unique among law school courses. I don’t elevate information technology because I happen to know how to teach it; I do it because I think it’s what the students need most and don’t get. It’s certainly what lawyers need most and don’t get.
Surprisingly, that’s a contentious question. The arguments against teaching the technology side of e-discovery and digital evidence range from “it’s not law” to “lawyers hire people for the tech stuff, so why bother?”
I think the explanation for the marginalization of information technology in e-discovery classes is simpler: lawyers teaching law school classes have a limited ability to teach technology. My guess is that if the teachers knew the technology as well as they know the law, there would be more balance in the curriculum.
The limits of instructors hobbles the curriculum of e-discovery, which should spring from the needs of the students. We should gear our syllabi to what must be learned rather than what can be taught. First, let’s teach the teachers.
That won’t be easy. The level of interest is low, and who wants to draw the circle of competence to leave themselves outside the circle? Too, there are virtually no instructional channels or materials. No formal incentives. No funding. Many invested in the status quo ante. And all that aside, there’s a dearth of experienced instructors. We are fuc… challenged.
I do what I can, a few students at a time; but, that’s not much, and it’s certainly not enough to make the difference we need. We can’t even forge a consensus about what’s needed.
So, it’s disheartening to perpetually advocate for technology’s place at the table when so many want you to stay in the server room and shut up. Yet, there are rewards. It’s fun when students ask thoughtful questions that demonstrate their interest in mastering the material (or at least passing the midterm).
I got some good questions from a student today and thought I’d share them, and my responses, to give you a sense of what we study in the first half of a semester. Some of my replies reference our Spring 2018 Workbook.
My student wrote:
- I generally understand RAID arrays, but I am having trouble distinguishing between the different types of RAID arrays.
- What is EXIF data?
- Can you explain the difference between application and system metadata, applying the conceptual definition to actual data?
1. I address the operation of RAID arrays at pp. 78-79 of the Workbook. The key takeaway should be that RAID configurations enable multiple hard drives to work together as an integrated storage system.
Mirroring data across more than one hard drive affords greater redundancy (because you have more than one copy of the data, each copy stored on a different physical device). Greater redundancy means greater protection against data loss stemming from drive failure. If one drive crashes, another has mirrored (completely copied) its contents. But, mirroring requires a lot of extra storage, thus greater cost without greater speed or capacity. A RAID 0 signifies “pure” drive mirroring = greatest safety. but least efficiency.
Striping spreads the work of storage across multiple drives called striped volumes, allowing data to be read and written more quickly by distributing slices (“stripes”) of data across multiple physical devices. It’s faster because the mechanical processes required to read and write data to a drive (rotating the platters and moving the read-write head) are the slowest part of the process. If you can do the slowest parts in tandem across multiple devices, you can move more data more quickly. But, increased efficiency comes at a cost of higher risk of failure. When you spread parts of data across multiple drives, any one drive failing can take down all of the data (since a data slice may be lost for many files). A RAID 1 signifies “pure” striping without redundancy = greatest efficiency with greatest risk of data loss from drive failure.
The different types of RAID arrays (denoted by different numbered RAID configurations) balance mirroring and striping to horesetrade efficiency for redundancy by employing a process called parity that dedicates some drive space to calculations that will allow for data reconstruction and a manageable level of risk of drive failure without immediate data loss. I discuss and illustrate RAID 5 parity in the Workbook at p. 78.
2. EXIF data (an acronym for EXchangeable Image File format) are metainformation, i.e., tags, embedded in rich media files like images and sound file formats. They carry information about myriad things, but you don’t see EXIF data in the image or hear it in the sounds. EXIF data resides in the file alongside the image or sound data. For example, digital photographs can carry dozens of embedded fields of EXIF data detailing information about the date and time the photo was taken, the camera, settings, exposure, lighting, even precise geolocation data. Photos taken with cell phones having GPS capabilities contain detailed information about where the photo was taken to a precision of about ten meters. We did a Workbook exercise extracting this data (exercise 13, p. 252).
3. Both application and system metadata are data describing other data (in the way that, e.g., the name assigned a file describes the file but is typically not stored inside the file). System metadata is information about files that is stored and tracked by the file system of the computer or device. System metadata is kept in tables (in Windows, in the Master File Table or MFT), separate from the files in much the same way as information about library books was once stored in paper card catalogues. It’s data about files stored apart from the files themselves, so it’s context, not content. By contrast, application metadata is data about the file stored inside the file (by the application, i.e., the software program that creates and uses the file). Copy a file and its application metadata moves with it. Hash a file and you hash its application metadata (NOT its system metadata). Application metadata is integral to the file and is content, not context.
So, which would EXIF data be? It’s application metadata, because it’s embedded within the files themselves and moves with the files when copied. I talk about the difference at pp. 210-211 of the Workbook and in many other places in your reading.
One way to know if metadata is application metadata is to e-mail the file to a different computer and see if the metadata goes along. The one exception we studied in our exercises is the file’s name. File names are system metadata, but they are the one item of system metadata that tend to be routinely handed off with the file, although not embedded within the file.
Application metadata examples: date printed and time edited metadata in MS Office documents. These move with the file because they’re inside the file.
System metadata examples: file name (you can change a file’s name without changing the file’s hash value–we did an exercise on this); hence, the name isn’t inside the file. Modified, accessed and created dates (MAC dates) are all system metadata values (with one rare exception applicable to MS Word files). You can prove this by mailing a file as an attachment to another computer. In Windows, the MAC dates will all change on the destination machine.