Category Archives: Data

Long term archiving

Professional archivists agonise about how digital archives should be stored, but it’s important for those of us further down the food chain consider it too. Many people are simply burning their most prized data onto CD or DVD, and shoving the discs into the bookshelf. But given known doubts about the lifespan of burnt discs, how will you feel if they reach for them in 5 or 10 years and find them unreadable? (Just like I recently found many of my old BBC Micro disks unreadable.)

Pressed discs seem to be no problem. I’ve got CDs that are close to 20 years old that are still going strong. But recent warnings have highlighted that burnt CDs might only last a few years (even taking great care in handling and storage).

It’s been suggested that magnetic tape is the way to go in the longer term, with a view to periodically migrating to newer technologies as they come along. I’m still not sure I want to invest in a tape drive…

The other issue is formats. What format should be used to ensure that when you or your descendants poke around in your files, they’ll be readable? It’s not just a matter of choosing formats that are ubiquitous now, but also those that will be common into the future.

Think back 20 years. What formats were popular in 1986 that are still around now?

I think, for example, that of all the formats, JPEG and PNG (for pictures), MPEG-1 or 2 (movies), and MP3 (sounds) are perhaps the formats that have such open, widespread support that they’re likely to still be readable in 20 or 30 years’ time.

For text documents? What’s practical probably depends on your source files. Obviously TXT is totally human-readable, but lacking formatting. HTML (with support from JPEG and PNG) is probably the most obvious choice for many documents, as long as you don’t try and do anything too clever with it. RTF also has widespread support via open-source products such as OpenOffice, Mac OSX TextEdit and while it’s owned by Microsoft, is arguably as human-readable as HTML, and arguably an easier conversion for many existing documents such as those in Word format (though I’m not sure it supports all of Word’s latest features).

For other more specialised file formats, I suppose it depends what is the easiest format to keep them in… Definitely more thought required.

(Of course if there’s any doubt, printing on paper is the ultimate in future-proof technology!)

Back It On Up

Cameron’s recent data loss is an example of why online back up will become an integral part of home computing in the years to come. As our memories are increasingly stored in digital format (I know in the 9 months of my son’s life there has not been one film based photograph of him taken) people will be looking for a secure off site means of ensuring no harm comes to their pictures or files. Often though, like Cam, it won’t be until after a disaster has hit

I’m currently using Mozy; a free, automatic, secure back online back up system from Berkeley Data Systems. It’s simplicity itself – you download the application, tick the boxes on the predefined back up sets (such as ‘Word Documents’, ‘Music’, ‘Mo vies’,’Photographs’,’Financial Records’ etc), Mozy tracks down all your files and away it goes. You can define your own back sets if you wish and even drill down to the file level to add or remove files for a particular set.

Mozy claims to use differential backup, so it should only back up the bits of your Outlook file that have changed, but I haven’t found that to be the case in my instance. The Mozy icon lives in your system tray and behaves itself very well by only backing up when your system is idle. My only problem with this has been it back ups when I have an unattended torrent going so it can impact on your bandwidth.

You get 2G of backup for free, which covers most of my documents save for my music and video collections. There is a premium service, currently offering up to 20G for USD$39.95. If you want to try it and use my referral link https://mozy.com/ref/UTVC5L we both get an extra 256MB of back up space.

Tell me how to fix the problem!

It timed out, or I closed something, or something. Then I tried typing, and Writely figured out I couldn’t prove I was me anymore:
Not logged in - fine. What do I do about that?
Not logged in – fine. But you’re meant to tell me how to fix the problem, or better yet, fix it for me.

Eventually I brought up the login screen in another tab, logged in, and all became good again.

Wow, how did I miss the Mechanical Turk?

Amazon Mechanical Turk is an astonishing idea – an Artificial AI marketplace. Basically, there’s an API you can call to get humans to do tasks (oddly enough, they want to be paid). Currently, a big favourite for the tasks is transcribing podcasts. I can see that it would be a cheap way to truth a set of training data for AI systems, like number plate detection / recognition.

An artist has used the Mechanical Turk to acquire 10,000 hand drawn left-facing sheep and put them on a site for your viewing pleasure – plus, there was an exhibition of the collectable stamp sheets etc (you can buy the as stamp-sheets for only $20 a sheet). Given the images cost less than a cent each to acquire, he may be a bullshit artist.

The Turk is an example of what Wired calls Rise of Crowdsourcing – Remember outsourcing? Sending jobs to India and China is so 2003. The new pool of cheap labor: everyday people using their spare cycles to create content, solve problems, even do corporate R & D. It’s about the markets, people. These are markets for micro-transactions – micro in their repeatability, or micro in their value.

Sucky factorial calculators

Look for “factorial calculator” on Google and you’ll take a long time to find a factorial calculator that thinks that 100! doesn’t have an ‘e’ in it. If you’re going to write a dinky little app like that, be aware that there are limitations to it and tell people. I’m not going to link to any of them, they’re all naughty applications that shouldn’t be allowed out in the real world. But Dima Stopel’s large number factorial calculator isn’t afraid to give you all the digits.

OmniNerd – Articles: Beating Traffic

Brandon U. Hansen tries to figure out if he can get to work faster – beating traffic.

The world is full of traffic and people who hate it. This article analyzes a year of data to determine if minor tweaks to departure times can significantly impact commute length – or if it is all out of the driver’s control.

Well, duh. Of course it does. But who’s going to go to work at 11:00 and return home at 19:00?

Anyways, looking at his opening figures is weird, because he says that driving to work soaks up 100 hours a year, and involves 15,000 miles – which implies an average velocity of 150mph. Even if he forgot the trip home, you’re looking at 75mph (120kph), which is unlikely unless your cousin is a cop or you live right on top of an Autobahn. Perhaps they’re driving to work in a rocket tractor – which I always thought was one of these, with one of these attached – but it seems most of the net thinks they’re one of these, and that would never work!

MySql woes

We’ve got MySql problems here at Geekrant central.

MySQL said: Documentation
#1016 – Can’t open file: ‘wp_comments.MYI’ (errno: 145)

Doesn’t sound good, does it? The ISP is looking into it.

Nothing else seems to be AWOL, but I’ve taken a backup of everything just in case. Wouldn’t you know it, the backup I have of wp_comments isn’t particularly recent. Hopefully the ISP has a newer one, but if not, I’ve grabbed a bunch of comments via Newsgator’s cache. Gawd knows how I’d restore them though.

Update: Fixed. May I just say, the support guys at AussieHQ hosting are deadset legends.