The Latin maxim Docendo Discimus means “by teaching, we learn.” So true, because absent my need to stay up-to-date to teach, it’s easy to fall behind. I teach various places, but of longest standing at the University of Texas School of Law, my alma mater. My subject is E-Discovery and Digital Evidence, a three-credit, 14-week course. In my course, information technology enjoys equal status with case law and procedure. Half the semester is dedicated to mastering the “e” in e-discovery: the foundations of modern information storage and retrieval. That balance is unique among law school courses. I don’t elevate information technology because I happen to know how to teach it; I do it because I think it’s what the students need most and don’t get. It’s certainly what lawyers need most and don’t get.
Why?
Surprisingly, that’s a contentious question. The arguments against teaching the technology side of e-discovery and digital evidence range from “it’s not law” to “lawyers hire people for the tech stuff, so why bother?”
I think the explanation for the marginalization of information technology in e-discovery classes is simpler: lawyers teaching law school classes have a limited ability to teach technology. My guess is that if the teachers knew the technology as well as they know the law, there would be more balance in the curriculum.
The limits of instructors hobbles the curriculum of e-discovery, which should spring from the needs of the students. We should gear our syllabi to what must be learned rather than what can be taught. First, let’s teach the teachers.
That won’t be easy. The level of interest is low, and who wants to draw the circle of competence to leave themselves outside the circle? Too, there are virtually no instructional channels or materials. No formal incentives. No funding. Many invested in the status quo ante. And all that aside, there’s a dearth of experienced instructors. We are fuc… challenged.
I do what I can, a few students at a time; but, that’s not much, and it’s certainly not enough to make the difference we need. We can’t even forge a consensus about what’s needed.
So, it’s disheartening to perpetually advocate for technology’s place at the table when so many want you to stay in the server room and shut up. Yet, there are rewards. It’s fun when students ask thoughtful questions that demonstrate their interest in mastering the material (or at least passing the midterm).
I got some good questions from a student today and thought I’d share them, and my responses, to give you a sense of what we study in the first half of a semester. Some of my replies reference our Spring 2018 Workbook.
My student wrote:
- I generally understand RAID arrays, but I am having trouble distinguishing between the different types of RAID arrays.
- What is EXIF data?
- Can you explain the difference between application and system metadata, applying the conceptual definition to actual data?
My replies:
1. I address the operation of RAID arrays at pp. 78-79 of the Workbook. The key takeaway should be that RAID configurations enable multiple hard drives to work together as an integrated storage system.
Mirroring data across more than one hard drive affords greater redundancy (because you have more than one copy of the data, each copy stored on a different physical device). Greater redundancy means greater protection against data loss stemming from drive failure. If one drive crashes, another has mirrored (completely copied) its contents. But, mirroring requires a lot of extra storage, thus greater cost without greater speed or capacity. A RAID 0 signifies “pure” drive mirroring = greatest safety. but least efficiency.
Striping spreads the work of storage across multiple drives called striped volumes, allowing data to be read and written more quickly by distributing slices (“stripes”) of data across multiple physical devices. It’s faster because the mechanical processes required to read and write data to a drive (rotating the platters and moving the read-write head) are the slowest part of the process. If you can do the slowest parts in tandem across multiple devices, you can move more data more quickly. But, increased efficiency comes at a cost of higher risk of failure. When you spread parts of data across multiple drives, any one drive failing can take down all of the data (since a data slice may be lost for many files). A RAID 1 signifies “pure” striping without redundancy = greatest efficiency with greatest risk of data loss from drive failure.
The different types of RAID arrays (denoted by different numbered RAID configurations) balance mirroring and striping to horesetrade efficiency for redundancy by employing a process called parity that dedicates some drive space to calculations that will allow for data reconstruction and a manageable level of risk of drive failure without immediate data loss. I discuss and illustrate RAID 5 parity in the Workbook at p. 78.
2. EXIF data (an acronym for EXchangeable Image File format) are metainformation, i.e., tags, embedded in rich media files like images and sound file formats. They carry information about myriad things, but you don’t see EXIF data in the image or hear it in the sounds. EXIF data resides in the file alongside the image or sound data. For example, digital photographs can carry dozens of embedded fields of EXIF data detailing information about the date and time the photo was taken, the camera, settings, exposure, lighting, even precise geolocation data. Photos taken with cell phones having GPS capabilities contain detailed information about where the photo was taken to a precision of about ten meters. We did a Workbook exercise extracting this data (exercise 13, p. 252).
3. Both application and system metadata are data describing other data (in the way that, e.g., the name assigned a file describes the file but is typically not stored inside the file). System metadata is information about files that is stored and tracked by the file system of the computer or device. System metadata is kept in tables (in Windows, in the Master File Table or MFT), separate from the files in much the same way as information about library books was once stored in paper card catalogues. It’s data about files stored apart from the files themselves, so it’s context, not content. By contrast, application metadata is data about the file stored inside the file (by the application, i.e., the software program that creates and uses the file). Copy a file and its application metadata moves with it. Hash a file and you hash its application metadata (NOT its system metadata). Application metadata is integral to the file and is content, not context.
So, which would EXIF data be? It’s application metadata, because it’s embedded within the files themselves and moves with the files when copied. I talk about the difference at pp. 210-211 of the Workbook and in many other places in your reading.
One way to know if metadata is application metadata is to e-mail the file to a different computer and see if the metadata goes along. The one exception we studied in our exercises is the file’s name. File names are system metadata, but they are the one item of system metadata that tend to be routinely handed off with the file, although not embedded within the file.
Application metadata examples: date printed and time edited metadata in MS Office documents. These move with the file because they’re inside the file.
System metadata examples: file name (you can change a file’s name without changing the file’s hash value–we did an exercise on this); hence, the name isn’t inside the file. Modified, accessed and created dates (MAC dates) are all system metadata values (with one rare exception applicable to MS Word files). You can prove this by mailing a file as an attachment to another computer. In Windows, the MAC dates will all change on the destination machine.
William Hamilton said:
Craig and others will be meeting at the University of Florida Levin College of Law on March 30th to address the pedagogical issues Craig highlights in this insightful and incredibly important post: what materials should be part of the law school core e-discovery curriculum and how can we effectively educate the educators.
LikeLike
Kimberly Christie said:
Great article, as always. Not being familiar with EXIF data, the Google machine brought me to this site: https://readexifdata.com/ where I browsed my own JPEGs and was amazed at the information provided. There is a GPS feature that if turned on would provide coordinates of the location where the photo was taken!
LikeLike
Michael Heenes said:
I wholeheartedly support your post Craig. And then to think that what is true for the USA, is even more true for the Netherlands; e-discovery and digital evidence curricula are just non-existent here. A shame for all lawyers and disgrace for educational institutes.
Let’s hope things look a bit different in a few years from now. Few people can make the difference.
LikeLike
Patrick Cronin said:
While I feel it is a little like preaching to the choir, I felt compelled to add my Amen. I cannot understand how one could charge $350 or more an hour and not understand the basic mechanism for binary classifications. If you mention linear regression, gradient descent, or convolutional neural networks you would be asked to leave the room. I was retained to perform a Cell Tower Analysis a few weeks ago and I was trying to explain the theoretical scope of coverage using a simple formula (1/3 *Pi * r ^2) and I was berated by the attorney that normal people just don’t understand that kind of talk. The problem was that the attorney’s client was facing 20 years in prison.
LikeLike
craigball said:
I can appreciate that hardly any juror has computed the area of a circle since ninth grade geometry, and some didn’t make it to ninth grade or pass geometry. But, we’re not talking about normal people. We’re talking about lawyers. People privileged to hold a license to represent others in court and who should be willing and able to learn new things as befits their need to find and present the evidence. Anyone should be able to grasp dividing the area of a circle into three equal slices of pie to reflect the signal emanating from the three faces of a cell tower. Okay, not Trump; but, most anyone outside the Oval. Certainly anyone with a Juris Doctor should be ASHAMED to admit that’s too much for their widdle noodle.
LikeLike
jimsmithenvlit said:
All advocates must have minimal competence regarding electronic documents. Craig’s service to the legal profession makes me proud to call myself a lawyer, and I am honored to call him my friend.
LikeLike
Jason Fulton said:
Has anyone proposed creating an applicable specialization by the Texas Board of Legal Specialization? While this would not solve the minimum competence problem it would create an incentive for attorney’s to invest a higher level of competence.
LikeLike