Category Archives: XML

Powerpoint file sizes

Was dealing with a big Powerpoint presentation (PPT) file.

In the older PPT format, 6063 Kb.

When zipped, 4826 Kb. Not a bad saving given the number of pictures in it.

Here’s the interesting thing: in PPTX format: 3293 Kb.

Remembering that PPTX and other Office Open XML formats (DOCX, XLSX etc) do their compression on the file as a whole, not the individual componenets, so this is an interesting result.

Perhaps the old binary format is inherently less efficient/compressible than the new XML format.

Mind you, another big PPT I tried it with didn’t compress down as much; the PPTX was about the same size as the ZIPped PPT, so it obviously depends on the exact content

Cool XML stuff

A bunch of XML Tools from the good people at Got Dot Net. The particular one I needed was XSD Inference, which creates an XSD from an XML document. I needed it to use with some code to validate XML against XSDs in VB6. It seems XSDs created from XML with some tools (I’m looking at you, XMLSpy — though maybe it’s fixed in later versions) won’t work properly using VB6/XML Parser 4 (which is what I’m using, at least for some of my stuff).

Office XML compatibility

Office 2007 has also been completed, and the compatibility pack is now available to allow earlier versions (back to 2000) to read the new file formats.

Though as Office Watch points out, while the Word one works well in both directions, it appears that the Excel and Powerpoint don’t have full conversion, just viewers.

What a copout. If they can do it for Word, they can do it for the others. Mind you given the plethora of information about the new formats I wouldn’t be surprised if a third party writes a proper conversion tool to fill the gap.

Misc stuff

Cool links I’ve found recently:

Super (MOV to AVI conversion).

VB to Java converter. That is, it compiles VB6 code into a Java class. Latest update here. Q+A. (No, you can’t download it yet, they’re still working on it.)

Oh, guess who’s on about giving away Digital set top boxes again? Yup. I do like this argument, actually: It is not the Government’s job to champion new technology. It is the Government’s job to provide universal infrastructure and manage the task in a financially responsible way.

XML Notepad, which after a looooong time not being available, is back, and upgraded. (Requires the .Net Framework 2).

Headlines via PHP/RSS

This utterly rocks, and I can’t believe I didn’t go looking for something like it before: MagpieRSS lets you show RSS headlines on a PHP page. I’m using it on my old toxiccustard.com page to show the latest headlines from my diary and the site’s News and Guide to Australia pages (which all run WordPress). It includes caching so you won’t burn up your (or anybody else’s) bandwidth by grabbing the feed continually.

A buncha quick stuff

EFF highlights an Australian House Standing Committee report on the US DMCA, and whether or not it should be adopted wholesale by Australia under the Free Trade Agreement.

Meanwhile there’s an open letter to the OFLC about the banning in Australia of the grafitti video game Getting Up: Contents Under Pressure. (Mind you, Metacritic only gives it a 73/100 on XBox; 70 on PS2).

OPML 2.0 is out. Let’s hope it doesn’t break OPML 1 like RSS 2 broke RSS 0.9?

The Age on the retro games boom.

Pah, this sucks. After 64 years in Swanston Street, the Technical Bookshop in Melbourne has moved out to the boondocks of LaTrobe Street near Queen Street.

A foray into corporate blogging

I’ve convinced the company I work for to start a corporate blog. So far it’s early days, with myself (blogger extraordinaire) being the only one brave enough to post anything (apart from an introductory “this is what we sell” post), but I’m hoping the others will also contribute, as the company markets a mucho good product, and there’s a lot of good knowledge of B2B, XML, and development in general, locked up in the various brains around the place.

eVisionRSS feed

RSS isn’t mainstream yet

Scoble argues that RSS’s importance isn’t in how many people are using it, but who those people are.

He’s right, but the other point to make is that RSS isn’t mainstream yet. Email and the web are mainstream, but took years to catch on with the general public, even after being widely available. RSS is widely available, but only used by a minority of the general online population.

That will change, as the tools used by the great unwashed pick up and highlight RSS functionality. That’s not Newsgator or Firefox, but IE and Windows.

It’ll change as the influential early-adopters persuade others.

And it’ll change as the standard is sorted out — not just the XML, but how it’s advertised — that orange button needs to be ubiquituous, just like “www” and “.com” in URLs are now.

So if your site doesn’t support RSS now, it’s important to get it doing so very very soon.

How to get feeds of Blogspot blogs which supposedly don’t have feeds

Got a favourite blog on Blogspot written by someone who hasn’t figured out how to enable XML feeds? No matter, just add atom.xml to their URL (eg http://reallyquiteunlikely.blogspot.com/atom.xml) and put that in your aggregator. Easy.

Office goes XML

Microsoft has announced the next version of Office will use XML by default — that is, Word, Excel and Powerpoint will use XML documents embedded in Zip files. They will also issue updates to those Office products back to the 2000 versions so they can use the new formats.

The XML will be documented, and open — to the extent that you will need to acquire a free licence from Microsoft to use it, on their terms presumably.

The terms of the licence will be interesting. You could contrast this to the MDB (Jet) format, which while it isn’t XML, and isn’t an open format, is quite well documented in its use via the various Microsoft libraries you can use to get at it (ADO/MDAC, DAO, etc). It’s interesting to note that Jet is royalty free, so you can give Jet databases to anybody if you have a Microsoft developer tool, though the one thing you can’t do is build a solution that does much the same thing as Microsoft Access. (It’s a similar story for all their other develop tools).

So the question is: will the MS licence preclude people from building, say, an alternative word processor or spreadsheet that can read and write the format? Will OpenOffice be able to use the format for interopability?

They imply no such restrictions will exist, with this to say on whether opposing products will make use of the format: Customers also know that the true value of a desktop application is not the format in which data is stored but the full breadth of capabilities offered by that application, along with the quality and security of the user experience that it providesSteven Sinofsky, Senior VP, Office.

Obviously switching to XML opens up a number of possibilities, making it much easier for third party applications to delve into documents to read/write data, without mucking about in the Office object models (which in turn ties you to COM and Windows). You could use XSLT to convert documents into other formats, or to display on new devices or applications.

It should lead to interesting developments, and let’s hope the other Office applications follow suit.

RSS adverts go mainstream

Google has moved RSS adverts into a wider beta, and Robert Scoble has been considering the benefits or otherwise of them. And he ranks types of feeds from worst (Headline only, with ads) to best (Full text with no ads).

Deciding whether or not to put adverts in your RSS (and indeed if your feed has all your text or just the partial text) is, I think, a matter of what you’re trying to do with your content. To bring it to total black and white, are you trying to make money, or get your ideas out?

Reality, of course, is shades of grey. For one thing, if you go the total black option (headlines only, ads in the feed, and presumably more ads on the site — since that’s the only reason you’d want to provide only headlines in the feed) then unless your content is pretty damn compelling, you’ll get no readers (at least not from feeds, and this is increasingly the way people consume their web sites), and thus no money, and your content goes nowhere.

Other end of the scale (full text in feeds, no ads anywhere) is okay, as long as you don’t get snowed under by readers, and end up paying so much in bandwidth that you can’t afford it anymore. Not likely these days, but theoretically possible, especially if your content is multimedia.

For most of us, I suspect, the balance is somewhere closer to white than black.

Introducing the message exchange

The company I work for is called eVision, and their main product is called MessageXchange. (See what happens to your spelling when you’re looking to find a good .com address?)

It’s basically a B2B message broker… messages go in, messages go out, and in the middle they get conditionally routed and transformed. The upshot is you can set up to hook up a bunch of systems that use completely different types of message… one system’s PurchaseOrderCreate in a fixed-length FTP’d batch file can happily go along as another system’s PurchOrdReq XML HTTP message.

The clever bit is in the monitoring, letting you see what’s pumping through the system at any time. And the fact that the whole shebang is configurable through a web interface.

It started out as a software package… well, not exactly a package, not in the MS Office sense, but a system you’d plonk on a Windows server or two and away you go. I haven’t been directly involved, but over the past year they’ve rejigged it as an ASP… a paid hosted service, that is, so that if you don’t want to run it on your own boxes, you pay to access it on fully maintained servers instead.

At the same time they’ve expanded its reportoire to cover a lot of the new and emerging XML standards such as ebXML and RosettaNet. As well as some of the more ancient, creaking standards like EDI.

Handy stuff. The whole B2B area must surely grow, it’s a no-brainer for reducing the cost of commerce. Will be an interesting area to watch. In fact I might get one of the guys to expand a bit on some of these topics…