DonBoy

Let me say publicly that DonBoy’s answer exudes a combination of intuitive genius and confidence that make me think DonBoy is going to do big things in his life. -- Steven D. Levitt (Freakonomics blog)

Thursday, December 04, 2003

Dead Media?

Volokh-ite Tyler Cowen points to this by Simson Garfinkel, from which he quotes:

It is simply inconceivable that documents created today in Adobe’s Portable Document Format (PDF), or images stored in the Joint Photographic Expert Group (JPEG) format, won’t be decipherable on computers in the year 2030. That’s because both the PDF and the JPEG formats are well-defined and widely understood. Adobe has lost control of PDF: there are more than a dozen programs that can create PDFs and display them on a wide range of computers. In other words, PDF is no longer a proprietary format. The same goes for JPEG. Yes, Adobe may fail and new 3D cameras may make two-dimensional photography obsolete. But we will always be able to read files in these formats, because the detailed technical knowledge of how to do so is widely distributed throughout society.

What about the physical media itself? Although there are many examples of tapes and floppy disks being unreadable five or 10 years after they are created, there are many counterexamples as well. Generally speaking, people who make an effort to preserve digital documents have no problem doing so.

This confuses (at least) two levels of description: the semantics of the data, and the physicality of it. Sure, the JPEG standard and the PDF standard will not be lost. The problem is getting those data sets off of the physical medium; as the pointed-to article mentions,

Take, for example, the electrical standard (sometimes called IDE, now called ATA) that’s used by the disk drives in most PCs. Developed in the 1980s, the ATA interface has been significantly enhanced over the past 20 years. Yet with rare exceptions, you can take a hard disk drive from the late 1980s or early 1990s, plug it into a modern desktop computer, and read the files that the disk contains. That’s because the power cables, physical mounting brackets, data connectors, and even the electrical signals used by today’s computers are compatible with the old drives. What’s more, today’s PCs, Macs, and Linux boxes all can read DOS file systems created in the 1980s. If the disk spins, you can frequently get back the data.

Fine. You can read DOS files from IDE drives, because nothing drastically better than IDE drives has come along; because most of the computing world runs Windows, which has taken great care to be backwards-compatible; and because the rest of the world cares enough to write code to get files from those disks -- remember, you need the physical level and the file system level, at least. Can we read Amiga files? Apple II files? TRS-80 files? From floppies, tapes, or whatever? And how long until something vastly better than IDE-compatible drives comes along -- super-fast super-hi-density rewritable optical storage or something -- and 30 years later we're in the same pickle trying to get back the data on IDE drives? Garfinkel posits that any optical medium will have to be infinitely backwards-compatible; but if it's dramatically better enough, it doesn't have to be. Who would have guessed what CDs would do to LPs? Imagine that optical storage is indeed backwards-compatible, but IDE is eventually obsolete. Sure, you're fine if you've migrated all your IDE data to CD-ROM; but if you haven't, you're out of luck. As Garfinkel says, "Generally speaking, people who make an effort to preserve digital documents have no problem doing so." The problem is people who haven't made that effort, possibly because the data has no clear owner who knows what's required.

The example of the Domesday book isn't a very impressive counter-argument. Certainly the knowledge of how one might possibly read the data from those videodiscs was available; as the author admits, it was a hugely expensive project, only undertaken because the data was considered very important. The concern here is for the data that's not very important.

Another point:

The Internet “Request For Comment” (RFC) series, started back in the 1970s, is readable on practically every computer on the planet today because the RFCs were stored in plain ASCII text. Similarly, you can download images sent back from the Voyager space probes 30 years ago and view them on your PC because NASA stored those pictures as bitmaps—pixel-by-pixel copies of the images without any compression whatsoever.

Can I get RFC's off of 7-bit ASCII paper tape? Can I get Voyager images from a Perkin-Elmer Interdata 8/32 (a machine I worked on in the early 1980's)?

- posted by Don Porges @ 12/04/2003 09:35:00 PM

free website counter