Category Archives: Data

Keeping old content

Unlike many organisations, the BBC has a very enlightened policy on leaving old content up on their web site.

Among other things, it says:

Our view is that these pages often contain a lot of information about the programme or event which may be of interest in the future. We don’t want to delete pages which users may have bookmarked or linked to in other ways.

In general our policy is only to remove pages where the information provided has become so outdated that it may lead to actual harm or damage.

If only more web sites took this view.

Powerpoint file sizes

Was dealing with a big Powerpoint presentation (PPT) file.

In the older PPT format, 6063 Kb.

When zipped, 4826 Kb. Not a bad saving given the number of pictures in it.

Here’s the interesting thing: in PPTX format: 3293 Kb.

Remembering that PPTX and other Office Open XML formats (DOCX, XLSX etc) do their compression on the file as a whole, not the individual componenets, so this is an interesting result.

Perhaps the old binary format is inherently less efficient/compressible than the new XML format.

Mind you, another big PPT I tried it with didn’t compress down as much; the PPTX was about the same size as the ZIPped PPT, so it obviously depends on the exact content

Disney: evil, but defeatably evil

Disney DVD’s slogan is Moves, Magic and More.  They got the more part right for sure.

There I am trying to back up my copy of Wall_e_lic2_d1 so that once the kids have scratched the living bejesus out of the playing version, a new one can be generated from the master. And also as to avoid the annoying ads, language selection and other remote-control-based activity at the start – just shove it in the DVD player and walk away. Thankfully Australian copyright law lets me do this.

The studio have been dicking around with the disk’s table of contents, giving it over seventy files it claims are five gig in size – which, giving the DVD specification, is not possible. What you need to do in circumstances like this is play it in some player that will tell you what the magic track that actually contains the movie, not some hacked version of it. Then back that one up.

In the case of this particular disk it’s track 53, 1:33:26 long weighting in at 5425.95MB in size.

Thing is, DVDShrink barfs on it. Like it does Cars, but for different reasons. Thankfully I’ve recently discovered that Linux has an equivalent to DVDShrink, but this one is still being maintained. K9copy is it’s name; Cars was processed with no problems, and it was only the tomfoolery on Wall-E that caused a pause in activity.

So there’s one less application that I need a copy of Windows to run.

Real Estate Websites Suck: Part 4

I’ve decided that I’m only going to look for properties with 4 (or more) bedrooms. I enter this as a search criteria, and the website says quite clearly “Results for properties for rent with 4+ bedrooms in {suburb}”.

So why do I get presented with 3 bedroom properties?

Facepalm. Five years, and these web sites still suck balls. Not only do searches not work, it appears that the site pegs my CPU at 100% when the rendered page is just sitting there. Some of their lovely JavaScript goodness I suppose.

If you ask nicely I might dig up and dust off my rant from five years ago…

Google Chrome on Linux: slow, memory hog

I’ve run the Google Chrome on Linux beta since it first become available, and my impression is: slow. I might be unusual, in that I typically have dozens and dozens of tabs open, and that may break Chrome’s model of shoving each page into its own process, and this PC has “only” a gig of RAM, but it’s slower than FireFox for the same task. Things were a lot worse before I loaded AdBlock and FlashBlock for Chrome. Now my CPU isn’t pegged at 100%.

Embedded JavaScript is affected by this performance hit, so that particular tools that I have help do my stuff, well, don’t anymore.

Most annoyingly, it seems, although I haven’t confirmed it, that the back button causes a page reload: it doesn’t come out of the cache. Or the slowness could make it look that way. But how long can it possibly take to render a page anyway?

On the upside, it hasn’t crashed, and I would have expected FireFox to mysteriously die without any explanation by now (a sign that Firefox is going to die soon is that tab-swaps/page loads become very slow, indicating a similar root cause which I’m guessing is memory exhaustion). Firefox has always done the mysterious death thing, and I was hoping that upgrading to 3.5 would fix things, but no dice.

I’m trying to decide whether it’s preferable to have my browser snappy, but occasionally fall in a big pile and get back up again, or a laggard that rolls with the punches. Perhaps I’ll split my browsing between them simultaneously; vital stuff on Chrome and throw-away stuff on FF, but that’s going to be a bit tough on my brain.

[UPDATE]
Well, it turns out that Chrome is a memory hog. I bought another gig of RAM, and wouldn’t you know it, the PC is flying. My suspicions were tripped when all of the RAM was in use, most of the paging file and the little orange disk activity light was slowly burning a hole in the wall on the other side of the room.

Summer 2009/2010 starts

I have a algorithm for detecting summer. Seven consecutive days in a row with a temperature of or above 20 degrees Celsius. I give you Summer, from the Bureau’s seven day forecast for Melbourne:
Forecast for Monday Max 20
Forecast for Tuesday Min 8 Max 24
Forecast for Wednesday Min 10 Max 25
Forecast for Thursday Min 12 Max 28
Forecast for Friday Min 16 Max 29
Forecast for Saturday Min 18 Max 28
Forecast for Sunday Min 16 Max 26
I should also point out that I consider there to be two seasons in Melbourne: Nice-but-hot (summer-ish) and a-bit-iffy (winter-ish).

Summer 2008/2009 starts

I have a algorithm for detecting summer. Seven consecutive days in a row with a temperature of or above 20 degrees Celsius. I give you Summer, from the Bureau's seven day forecast for Melbourne:

Thursday      Fine.                                  Min  6    Max 21
Friday        Mainly fine.                           Min 12    Max 25
Saturday      Fine.                                  Min 12    Max 30
Sunday        Shower or two.                         Min 15    Max 22
Monday        Fine.                                  Min 10    Max 23
Tuesday       Fine.                                  Min 12    Max 28

I swear, this gets earlier and earlier each year.

Weird bug

Let’s say, for example, that a system supplies you the time of some event in UTC, you convert it to local and shove the date/time up on the display. Say, for argument’s sake, you also include the Day Of Week, ending up with a format of DD/MM, DOW HH:MM. Everything looks fine, until someone notices that the Day Of Week is wrong. The 28th of May is a Wednesday, not a Thursday.

What happened?

The date conversion routine that generates the DOW string does a bunch of odd stuff, but seems to work correctly; it certainly works in other parts of the code, and generates the right string there.

WTF?

The UTC time seemed to be converted to local time twice, but that wasn’t the culprit; surprisingly, no-one is killed in an explosion of silicon splinters when that code is double-executed. Whatever.

Could it be that the system supplying you the time of that event in UTC is off by a year? One year into the future. That would give you that behaviour.

Check it.