Category Archives: Archiving

Will your data survive in readable usable form?

Census night is coming

The census delivery chick turned up and offered us the option of paper or electronic form.

Two programmers looked at each other, thought about how they value their time and the response was a no-brainer:

“We’re programmers,” I explained, “we’ll take the paper form.”

“There’s a phone number you can call if you have any trouble filling out the electronic form” reassures the collector.

Cathy thinks: “Sure, that line won’t have any trouble when twenty million Australians simultaneously log into the web site to fill in the forms via a broken SSL link, using IE specific controls (that only work under some versions Windows assuming they’re correctly patched and have the right libraries loaded), demanding full round-trips to the underspec’d Windows servers to populate unnecessarily complex custom controls, some of which will no doubt demand Flash or COM. Come to think of it, it probably won’t even be web based, and we’ve only got two Windows boxes, one of which is tucked under a table (Yay! Census night on the floor swearing at the ABS’s programmers!) and the other has a screen resolution that went out with buggy whips (I’ve had programs barf and refuse to run because the resolution was unacceptable).”

We chose paper. For another view of the world, I’m looking forward hearing to how census night worked for Daniel…

Facebook: Download your information

Facebook downloadI had a quick look at Facebook’s Download Your Information feature — evidently added a few months ago due to criticism about the accessibility of people’s data once it’s dumped into the Facebook bottomless pit.

You can find it via the My Account screen, by clicking Download Your Information.

It asks for some time to compile all the information — in my case this took about half an hour — then emails you to say it’s ready to download, and provides a link and re-checks your password.

It comes as a single zip file, with HTML and pictures inside it.

Opening the index.html file, you’ll find a version of your Profile page, with links to all the other information in the archive, including Wall, Photos, Friends, Events, Messages.

The Wall in my case was 1.5 Mb of HTML, going back to 2007, and I suspect is every Wall post (and replies from friends) I’ve ever made. Friends is just an unlinked list of all your friends (name only). Messages has all your message threads, and replies.

You can browse the photos via the directory of the same name; subdirectories reflect the folders. It looks like all the photo files are at the size that Facebook shrunk them down to when they were uploaded.

To actually get this information into another service, you’d need to do some trickery with munging the HTML. The code they’ve used seems relatively clean and easy to parse.

So all in all, quite a handy feature, and goes a long way towards dispelling fears that information pumped into Facebook was lost forever behind a zillion clicks of to show “Older Posts”.

(It doesn’t appear that Twitter has a comparable feature.)

Keeping old content

Unlike many organisations, the BBC has a very enlightened policy on leaving old content up on their web site.

Among other things, it says:

Our view is that these pages often contain a lot of information about the programme or event which may be of interest in the future. We don’t want to delete pages which users may have bookmarked or linked to in other ways.

In general our policy is only to remove pages where the information provided has become so outdated that it may lead to actual harm or damage.

If only more web sites took this view.

Disney: evil, but defeatably evil

Disney DVD’s slogan is Moves, Magic and More.  They got the more part right for sure.

There I am trying to back up my copy of Wall_e_lic2_d1 so that once the kids have scratched the living bejesus out of the playing version, a new one can be generated from the master. And also as to avoid the annoying ads, language selection and other remote-control-based activity at the start – just shove it in the DVD player and walk away. Thankfully Australian copyright law lets me do this.

The studio have been dicking around with the disk’s table of contents, giving it over seventy files it claims are five gig in size – which, giving the DVD specification, is not possible. What you need to do in circumstances like this is play it in some player that will tell you what the magic track that actually contains the movie, not some hacked version of it. Then back that one up.

In the case of this particular disk it’s track 53, 1:33:26 long weighting in at 5425.95MB in size.

Thing is, DVDShrink barfs on it. Like it does Cars, but for different reasons. Thankfully I’ve recently discovered that Linux has an equivalent to DVDShrink, but this one is still being maintained. K9copy is it’s name; Cars was processed with no problems, and it was only the tomfoolery on Wall-E that caused a pause in activity.

So there’s one less application that I need a copy of Windows to run.

Risking your irreplaceable images

Oh no, George is at it again:

Q. I want to archive family photos and slides from our hard drive onto a DVD. However, I have read that home-burnt DVDs and CDs can have a short shelf life of about five years. What is the best technology to store 1-5 GB of irreplaceable images?
B. McGregor

A Manufacturers claim life spans of 30 to 100 years for DVD-R and DVD+R discs and up to 30 years for DVD-RW, DVD+RW. Your advice about a five-year life may apply to a CD that has not been burnt, as in that state the storage life is much shorter. For archiving you should use a premium-quality product, which in my opinion is Verbatim as they come out on top in almost all independent reviews that I have read.

No no no no no. You don’t tell someone who wants to store irreplaceable images that it’s fine to chuck it on a DVD, and blindly believe the manufacturer’s claim of the 30 years plus lifespan. The technology is not yet nearly that old, so while theoretical lab tests might claim that, in my book it’s not conclusively proven, and plenty of people have had problems.

If the files involved are genuinely irreplaceable, the message here is to make sure you don’t rely on one copy, or even on one medium. You make multiple copies, in a format that is futureproof (JPEG probably being the best for photos), distribute them widely (for instance with different family members) and check and copy them regularly onto new media.

You sure as hell don’t burn a single copy and chuck the DVD in the cupboard and hope nothing renders it unreadable.

Home Improvements – Here endeth the lesson

For the story so far see Part 1 and Part 2. If you’re totally bored, then please don’t read on… this is the longest post yet!

So I got my Linksys NSLU2 home. I thought I’d fire it up and make sure it worked. There’d be nothing more frustrating than flashing it with the Linux OS, find it doesn’t work and then wonder whether the issue is with the new Firmware or the actual hardware.

Plugged it in, fired it up, plugged in and formatted a blank external drive I dug out of the cupboard. All good so far! I can’t plug in a disk with anything on it because the LinkSys requires disks to be formatted with EXT3.

Hmmm… what’s this… a firmware upgrade to the NSLU2 that allows it to read NTFS! That’d make the device usable until I get my head around the Linux options!

Loaded up the upgrade, all went smoothly. Plugged in my external hard drive to see if it works. Get “Drive not formatted” message in the NSLU2 admin screen, so it must not support NTFS after all. Oh well. Plugged the external drive back into my desktop PC.

“This disk is not formatted. Do you want to format it now? Yes/No”

My

heart

stopped.

An entire disk’s worth of data… gone. Video from when the kids were little, lots of photos… gone. I know what you’re all thinking… why wasn’t this data backed up? I have two responses to this. 1) It’s not that easy to back up a 14GB video file. 2) Part of the reason I was setting up this solution is to make automated backups more accessible!

Some have said that I shouldn’t have trusted the device with my data, but in my defence, it’s a shrink wrapped consumer device that’s designed to have drives plugged in to it. If I can’t trust this device with my data, I don’t have much use for it!

I kicked off a File Recovery scan and went to bed very sad.

In the morning, the file recovery had found a bunch of deleted files, but none of the files that were not deleted at the time of the corruption! I tried loading the drive up in a couple of EXT3 file viewers, but they couldn’t read the drive either.

I’d pretty much given up hope of getting my data back.

Then my neighbour nonchalantly suggests I try a partition table repair tool. I load one up and run it. It tells me “The partition table on the disk is incorrect. Would you like to fix it?” I click “Yes”. Bang. All my data is back!!!

Yay! Waves of relief! Not to mention proof that the Linksys had screwed up the disk. The partition table was written for an EXT3 disk, even though it was still formatted in NTFS.

Yesterday I took the Linksys back to Harris Technology and threw it at them as hard as I could. Actually I didn’t and they were incredibly helpful, giving me a full refund without any hassle.

So back to the drawing board. Now that I realise how precious that data is to me, I’m going to have to get a proper, RAID based network drive solution. More money :( I’ll probably go for a Thecus N2100.

Lesson the First
Imagine losing all your data that is not backed up. How do you feel about that?

Lesson the Second
No, really. Losing it. Right now. Seriously, how do you feel about that?

Weigh your reaction to the above questions against the cost of getting dedicated backup.

Here endeth the lesson.

Update: I was talking to Josh last night and he said it wasn’t clear that I hadn’t installed the funky open source firmware on the LinkSys box yet. It was running the latest official firmware release. I probably also didn’t emphasize enough that I wouldn’t recommend anyone buying one of these pieces of junk

Emulation saves the day

This is cool: Emulating a BBC Micro, Amstrad, Spectrum or Dick Smith VZ300 in a Java applet. Maybe I’ve been wrong in dissing Java.

Speaking of Beebs, apparently a version of cross-platform emulator BeebEm has been used to try to ressurect the 1980s BBC Domesday project. It makes interesting reading, particular with regards to the problems of digital preservation… not to mention the value of the resource in being a record of life in Britain from the time it was compiled.

Long term archiving

Professional archivists agonise about how digital archives should be stored, but it’s important for those of us further down the food chain consider it too. Many people are simply burning their most prized data onto CD or DVD, and shoving the discs into the bookshelf. But given known doubts about the lifespan of burnt discs, how will you feel if they reach for them in 5 or 10 years and find them unreadable? (Just like I recently found many of my old BBC Micro disks unreadable.)

Pressed discs seem to be no problem. I’ve got CDs that are close to 20 years old that are still going strong. But recent warnings have highlighted that burnt CDs might only last a few years (even taking great care in handling and storage).

It’s been suggested that magnetic tape is the way to go in the longer term, with a view to periodically migrating to newer technologies as they come along. I’m still not sure I want to invest in a tape drive…

The other issue is formats. What format should be used to ensure that when you or your descendants poke around in your files, they’ll be readable? It’s not just a matter of choosing formats that are ubiquitous now, but also those that will be common into the future.

Think back 20 years. What formats were popular in 1986 that are still around now?

I think, for example, that of all the formats, JPEG and PNG (for pictures), MPEG-1 or 2 (movies), and MP3 (sounds) are perhaps the formats that have such open, widespread support that they’re likely to still be readable in 20 or 30 years’ time.

For text documents? What’s practical probably depends on your source files. Obviously TXT is totally human-readable, but lacking formatting. HTML (with support from JPEG and PNG) is probably the most obvious choice for many documents, as long as you don’t try and do anything too clever with it. RTF also has widespread support via open-source products such as OpenOffice, Mac OSX TextEdit and while it’s owned by Microsoft, is arguably as human-readable as HTML, and arguably an easier conversion for many existing documents such as those in Word format (though I’m not sure it supports all of Word’s latest features).

For other more specialised file formats, I suppose it depends what is the easiest format to keep them in… Definitely more thought required.

(Of course if there’s any doubt, printing on paper is the ultimate in future-proof technology!)

Back It On Up

Cameron’s recent data loss is an example of why online back up will become an integral part of home computing in the years to come. As our memories are increasingly stored in digital format (I know in the 9 months of my son’s life there has not been one film based photograph of him taken) people will be looking for a secure off site means of ensuring no harm comes to their pictures or files. Often though, like Cam, it won’t be until after a disaster has hit

I’m currently using Mozy; a free, automatic, secure back online back up system from Berkeley Data Systems. It’s simplicity itself – you download the application, tick the boxes on the predefined back up sets (such as ‘Word Documents’, ‘Music’, ‘Mo vies’,'Photographs’,'Financial Records’ etc), Mozy tracks down all your files and away it goes. You can define your own back sets if you wish and even drill down to the file level to add or remove files for a particular set.

Mozy claims to use differential backup, so it should only back up the bits of your Outlook file that have changed, but I haven’t found that to be the case in my instance. The Mozy icon lives in your system tray and behaves itself very well by only backing up when your system is idle. My only problem with this has been it back ups when I have an unattended torrent going so it can impact on your bandwidth.

You get 2G of backup for free, which covers most of my documents save for my music and video collections. There is a premium service, currently offering up to 20G for USD$39.95. If you want to try it and use my referral link https://mozy.com/ref/UTVC5L we both get an extra 256MB of back up space.