Office’s garbled HTML

Brian Jones on why Microsoft Office 2000 (and later) produces such godawful HTML:

Our scenario was that people would start saving “docs” as HTML on their intranet sites and browse them with the browser. We viewed the browser as “electronic paper” that we had to “print” to (i.e. perfect fidelity). We had already got a lot of feedback from our Word97 Internet Assistant add-in that any loss of fidelity when saving as a web page was unacceptable and a “bug”. As it turned out, this usage scenario did not become as common as we thought it would and a zillion conspiracy theories formed about why we “really” did it. Many people assumed that a better approach would have been to save as “clean” HTML even if the result did not look exactly like what the user saw on the screen. We felt that the core office applications (other than FrontPage) were not really meant to be web page authoring tools, so we focused on converting docs to exact replicas in HTML. We didn’t want people losing any functionality when saving to HTML so we had to figure out a way to store everything that could have existed in a binary document as HTML. We thought we were clever creating a bunch of “mso-” css properties that allowed us to roundtrip everything. HTML didn’t take off in the same way we had expected, and today, the main use for Office HTML is for interoperability on the clipboard, though of course the biggest use is within e-mail (WordMail).

None of this explains why Office 2003’s “Filtered HTML” is so riddled with proprietary tags, though. Admittedly, a filtered HTML file is smaller than a roundtrip HTML file out of Word, but it’s still hugely bigger than the type of HTML you’d write from scratch (or in a web page editor such as Dreamweaver or Frontpage), and the source code is unreadable.

To my mind, Filtered HTML should be just that: HTML, filtered in such a way that the basic structure of the document is preserved, but none of the junk that Word (or whatever) stores along with it. Leave that for the roundtrip HTML — though I can’t see the appeal in that either, since if you want to store documents in a viewable form on the great InterWeb, PDF is the way to go. Or just store it in the native Office format for internal use, when you know every user will have the application or a viewer.

Word warning(By the way, when I was trying out the roundtrip HTML the other day, while reloading, Word presented me with a strange warning that it was going to query from some nonsense “Z” table to put data in the document. Bizarro. The test document did quote some SQL, but this would seem to suggest the roundtrip HTML isn’t all it’s cracked up to be.)

Anyway, Brian’s full article is about the progression of the Office formats from binary in the 90s into the XML to be used in the next version. Well worth a read if you want some background on the history, and where they’re going now.

Byebye to an Adobe cash cow

PDFMicrosoft has announced the next version of Office will support PDF creation natively.

Obviously, Adobe has faced competition before from various PDF creation applications, including the DIY method using… what was it, some kind of printer driver to get Postscript, then ps2pdf to get it into PDF? And it’s not as if Adobe has been resting, not enhancing Acrobat with extra functionality.

But this is different: the prime reason people buy Acrobat is to create PDFs from Office documents. And so far the cheapie clone Acrobat products/methods haven’t won much market share, because people trust the name brand PDF creator. But this is Microsoft, and if there’s one thing Microsoft does well, it’s blowing away other companies’ sales.

The Adobe guys must have seen this coming when they opened up the format. Maybe that was a factor in diversifying by buying Macromedia. It’ll be interesting to see their response.

Disabling the Insert key

MS Office 2003: Customise keyboardI can’t tell you how much I hate Windows’ overtype mode. Accidentally tap the Insert key, and you suddenly find your typing overwriting old text. Who would use such a pointless thing?

And it’s doubly worse in products such as Word, where the only clue that you’re in this stupid mode is the almost-invisible ungreying of the letters “OVR” on the status bar.

Even worse in other apps: Excel has it, invisibly, only when you’re editing cells. Powerpoint doesn’t have it. Thankfully Ultraedit noticably changes the cursor when it’s invoked.

It’s there, but invisible, in Outlook. If you set Outlook to use Word for editing messages, it does it invisibly because the Word email window has no status bar, but if you have a Word window sitting in the background, you can see the OVR status light up on that!

At least it can be disabled in Word:

  • Tools -> Customize -> click Keyboard
  • In the Categories, choose All Commands
  • In the Commands list, scroll down and find Overtype
  • In the Current Keys box, the word “Insert” should appear. Click on this, then click the Remove button. Then close the dialog boxes, and you’re done.

Wouldn’t you know it, this setting isn’t global throughout Office. So the Insert key will still do stupid things in Excel and Outlook. (Using Word for writing Outlook messages will get around it, but that might be too big a price to pay.)

See also: MS KB 198148

MS Office 2003 SP2

Microsoft have released Office 2003 Service Pack 2, which focusses on security updates.

The journal formerly known as Woody’s Office Watch are reporting problems already, and recommending people steer clear of it for a little while.

(The Office Watch web site could do with some work. They’ve written their archives page in such a way that sometimes the lines don’t wrap, but you have to scroll across to read them; and try clicking the “Contact the webmaster” link to point it out, and you get your email program popping up composing a mail to “watchit”… very bloody useful.)

All about @

The @ symbol has been around for ages in commerce, but has gained a new lease of life since email became popular. In English it means “at”, but in other languages it doesn’t, and is called a variety of things, such as in Danish: snabel, meaning elephant’s trunk. Find out more here. (Thanks Justine)

Google stuff

Google’s guidelines on web sites.

A Googler’s guidelines on how to get back in if Google kicks you out for something naughty.

It is a little worrying though that the process seems a tad secretive. While Google does an excellent job of keeping the spammers out of their index, and I suppose they don’t want to give the spammers too much information on how it’s done, legitimate sites do get caught up in it from time to time (sometimes through ignorance), and there seems to be little in the way of feedback from Google about what a site might have done to get themselves banned.

Briefs

One more reason Lego rocks: they don’t mind if people hack their stuff.

Need to wipe, kersplat, zap, nuke, delete, a hard disk, but don’t want to have to physically pull it out of the machine and jump on it, drown it, then take a hammer to it? Like, if you want someone else to be able to use it? Try Darik’s Boot and Nuke. (via Colin)

With hot rumours of the Australian iTunes shop being about to launch, this guide to DRM covers how various online stores restrict what you can do with the music you buy.

Expensive Australian Apples

So, Apple still can’t get my iPod to synch my contacts and calendar even after an upgrade to iTunes they now seem to be charging me a premium for living in Australia.

If I wish to purchase QuickTime Pro in the US it will cost me USD $29.99. That translates to AUD$39.37 at toda Discus Care Made Easy y’s exchange rate. So why, given it’s a download and there’s no shipping or media involved, is the Australian apple site charging me AUD $44.95 for the same product?

Before the Appleites (you like, it’s yours to use too) get up in arms, yes, Microsoft do the same thing, as do Macromedia, Adobe and probably most major software houses but it happens that I want an Apple product now and they want to charge me more for it because I’m not in the United States and right now that’s really annoying me.

Discus Care Made Easy

Dodging Usenet morons

There are some real morons on Usenet. Most newsreaders have a Block Sender option so you don’t have to look at their stupid posts, but the worst ones (hello Matthew Goodyear) change their (alleged) email addresses regularly. And some poor fools keep responding to these trolls, so you still end up seeing a lot of garbage posts.

What newsreaders need is

  • An option to hide messages written by a particular person, and any responses to those messages
  • An option to hide messages written by a particular person, nominated by their alias (name), including wildcards, and not their email address

I wonder if any Windows newsreaders do this already?

Hurricane Rita

I’ve been notified by my web ISP that Hurricane Rita is approaching Houston. Why does this matter? Because geekrant.org (and a number of other sites I run) are sitting on a server in a data centre in Houston. I’ve been encouraged to take backups of important content, which I’ll be doing. It’s a reminder that regular backups are an essential precaution.

If the site goes down in the next day or two, you’ll know why. Best wishes to those in the affected areas.