Long term archiving

Professional archivists agonise about how digital archives should be stored, but it’s important for those of us further down the food chain consider it too. Many people are simply burning their most prized data onto CD or DVD, and shoving the discs into the bookshelf. But given known doubts about the lifespan of burnt discs, how will you feel if they reach for them in 5 or 10 years and find them unreadable? (Just like I recently found many of my old BBC Micro disks unreadable.)

Pressed discs seem to be no problem. I’ve got CDs that are close to 20 years old that are still going strong. But recent warnings have highlighted that burnt CDs might only last a few years (even taking great care in handling and storage).

It’s been suggested that magnetic tape is the way to go in the longer term, with a view to periodically migrating to newer technologies as they come along. I’m still not sure I want to invest in a tape drive…

The other issue is formats. What format should be used to ensure that when you or your descendants poke around in your files, they’ll be readable? It’s not just a matter of choosing formats that are ubiquitous now, but also those that will be common into the future.

Think back 20 years. What formats were popular in 1986 that are still around now?

I think, for example, that of all the formats, JPEG and PNG (for pictures), MPEG-1 or 2 (movies), and MP3 (sounds) are perhaps the formats that have such open, widespread support that they’re likely to still be readable in 20 or 30 years’ time.

For text documents? What’s practical probably depends on your source files. Obviously TXT is totally human-readable, but lacking formatting. HTML (with support from JPEG and PNG) is probably the most obvious choice for many documents, as long as you don’t try and do anything too clever with it. RTF also has widespread support via open-source products such as OpenOffice, Mac OSX TextEdit and while it’s owned by Microsoft, is arguably as human-readable as HTML, and arguably an easier conversion for many existing documents such as those in Word format (though I’m not sure it supports all of Word’s latest features).

For other more specialised file formats, I suppose it depends what is the easiest format to keep them in… Definitely more thought required.

(Of course if there’s any doubt, printing on paper is the ultimate in future-proof technology!)

2 thoughts on “Long term archiving

  1. josh

    Anything that can host as an ftp server is a good bet. I reckon that for digital media, there is no long term archiving, only short term (around five years), with migration to the next media.

    I’ve been using hard drives for archiving for a while now, it’s relatively cheap and easy – each cycle I can upgrade to a drive big enough to hold all previous drives plus whatever I’ve generated in the mean time – 360Mb -> 1Gb -> 4Gb -> 10Gb -> 40Gb -> 200Gb has been the cycle thus far.

  2. Martin

    (Not sure if anyone will read this after 1 week, but took me that long to find some links to “distributed preservation networks” and LOCKSS in particular. Which I think is interesting, see last paragraph below.)

    For the physical side of things, use your hard drive as the primary storage and back up everyting to CD/DVD every 6 months. Perhaps check the pervious DVDs to see if any have deteriated within 6 months. Or just back up to a removable hard drive. If you data is samll then DVD’s would be cheap and quick. If your data is large, then DVD’s would take a long time and not be much cheaper than a removable hard drive. If you have more data than can fit on your hard drive then, erm I erm don’t know.
    Tapes aren’t crash hot either. Drives tend to last 3-5 years (perhaps more under home use). And you can’t always read a tape from a differnt drive.
    For sutiable formats I would think jpeg, straight (7 bit) ascii, HTML and XHMTL are safe. Probably PDF and postscript too. Though PDF and postscript have so many options and extensions that you almost need to ensure you are using only the basic options which is probably hard to specify in your applications. I think tiff images have this problem too.
    For video, mpeg-2 or mpeg-4. (or h.263 or h.264 – I am not sure how different these are to mpeg-4). Probably safest to use whatever format hard drive of flash memory viedo cameras use.
    In general, make sure there is open source sofware avail to read any formats you want to use. I think XML formats would be good too. If you save the DTD’s, which form more or less a syntax for the document format, then the doucments would be reasonably human readable.

    What would be nice would be what some libraries are looking at for he ever increasing numbers of electronic documents they need to preserve. They are looking at saving copies at a lerge number of other libraries, possibly encrypted. See http://www.lockss.org/lockss/Home as an example or google for “distributed preservation networks”. Something like this for home users would be nice. You would probably need 2.5 times the disk space, to retrunt he service to others, and it may be slow even over broadband.

Comments are closed.