Category Archives: Excel

Getting Excel data into a plain HTML table

I’d been looking for something like this for ages, but must have been Googling the wrong terms.

How to get the contents of an Excel spreadsheet into a simple, no-format, plain HTML table:

No-Cruft Excel to HTML Table Converter


Importing into SQL Server

Alas SQL Server Management Studio isn’t as friendly as it could be for pasting in data. You’d think Microsoft would have this humming, but when I tried to paste from Excel, it attempted to paste the entire first row from my spreadsheet into the first column (in one row) of the database.

Using MS Access to open up the database probably would have worked, but I didn’t have it on that machine.

Trying to import using the SQL Server Import And Export Data wizard from a CSV text file worked for a small amount of data, but the 80,000 rows I was trying to import from the world ports code list didn’t. Time and time again it would report an error (unspecified) and give me the option of Abort, Retry, Ignore. No matter option I chose, it crashed.

While the 64-bit version of the wizard on my 64-bit Win7 machine didn’t allow you to import from Excel/Access, the 32-bit version did (presumably because MS Office, at least the version I have installed, is 32-bit).

The next problem was that it only supported Excel 2003 format, which can’t handle more than 64K rows. I ended up having to split the data into two and import the two spreadsheets separately. Then it worked.

Shame the wizard is so flaky, and of course it’s a big shame that Management Studio doesn’t do copy/paste like one would expect. (Maybe that too was a 32-bit/64-bit issue.)

Quick look at Google Spreadsheets

Here’s what it looks like. Click it to see it bigger.

Google spreadsheet

Not bad, but looks very simple compared to Excel. This could be a good or a bad thing. Does a lot of Excel-like things. This kind of comparison is inevitable, as most users are familiar with it.

Doesn’t feel as refined as Writely. Right-click not harnessed.

Linked to your Gmail/Google logon, of course.

Exports a reasonably clean HTML table. (Certainly better than any released versions of Excel/Office.)

Watch out if you lose your connection:

Google spreadsheet lost connection

My immediate reaction is this could be useful for some things — like Writely, if you want to be able to edit from anywhere connected, and/or share the document realtime with others.

But like Scoble, I can’t see it replacing “offline” apps for individual users just yet. Hosted apps within a corporation however, that could (eventually) be another matter, especially as the technology matures.

See also: Josh and I look at Writely.

Google spreadsheet soon

It’s widely reported that Google will be releasing an online spreadsheet on the 6/6/2006 (presumably US time).

Probably limited signup initially, but keep an eye on, which is currently showing a 404 (rather than a “Host not found”).

Update 20:15. Sneak peak and sign-up (they’ll email you when it’s available).

Update 22:00. Now at I get “Server Error. The server encountered a temporary error and could not complete your request. Please try again in 30 seconds.”

Hiding Excel warnings during automation

Well, I sorted out my problem of confusing warnings appearing whilst controlling Excel with VBA. Turns out there is an Application.DisplayAlerts property which, when set to false, hides warnings such as the one I was getting. It took a little Googling to find the solution, which wasn’t readily apparently in any of the MS help for the methods I’d got the warnings from.

The other Office applications also have a DisplayAlerts property.

Needless Excel automation warnings

Okay, this is annoying. I’m working on a VB program that uses the Excel object library to automate a fairly complex update into Excel. The general idea when you’re automating Excel is to smoothly do your operation behind-the-scenes, to hide the complexity from the user.

Excel warning

So the last thing you need is complicated dialog boxes popping up to ask the user questions. I’m the programmer: I’m meant to make the decisions. Tell me, the programmer, that if I save this Shared Workbook with a password that certain parts of the file won’t be encrypted. Don’t tell my user, and ask them to decide if it should happen or not.

Creating unique reference IDs in Excel

I often need to create unique IDs in an Excel spreadsheet for importing into our database system. Doing this by hand is time-consuming and error-prone, so here is a method I’ve found which saves a lot of time and effort.

Let’s say I want to create an ID in the format “IMPORTA-1234″, where 1234 is a unique number from 1 to the total number of records in the spreadsheet. However, I would also like them neatly formatted (purist that I am), so that record 5 is shown as IMPORTA-0005 not IMPORTA-5, record 124 is shown as IMPORTA-0124 not IMPORTA-124, and so on.

1. Create a blank column in your spreadsheet, if one doesn’t already exist, for the ID. Call it something like ImportID or UniqueID.

2. Enter the following formula into the first data row (probably row 2 of the spreadsheet):
(where A2 is the reference of the cell the formula is in)

3. Now copy and paste this into each row – being a relative formula ensures that each ROW references the correct cell.

What does this do? Let’s look at each part of the formula in turn:

CONCATENATE – This creates one string comprising the three distinct elements needed to form a unique ID.

  • Element 1 is the static “IMPORTA-” string.
  • Element 2 is the appropriate number of ’0′ characters to pad out shorter numbers.
  • Element 3 is (row number minus one) which gives a unique number based on the record’s position in the spreadsheet (and accounting for a header row)

To ensure that the correct number of zeroes are used as padding, we use the LEFT function to grab a portion of the string “0000″. We use the ROW(cell reference)-1 function to get the unique number, and we use 4-LEN(ROW(cell reference)-1) to work out how many characters long that number is.

Let’s walk through a couple of examples to show how this works:

For row number 25, which is record 24:

  • First part of string is “IMPORTA-”
  • We now use LEFT(“0000″,4-LEN(ROW(A25)-1)).
    The length of ROW(A25)-1 is 2 (24 is two characters long)
    The (4-2) left most characters of “0000″ are “00″
  • Final part of the string is the (row number – 1) = 24

Output ID = IMPORTA-0024

For row number 164, which is record 163:

  • First part of string is “IMPORTA-”
  • We now use LEFT(“0000″,4-LEN(ROW(A164)-1)).
    The length of ROW(A164)-1 is 3 (163 is three characters long)
    The (4-3) left most character of “0000″ is “0″
  • Final part of the string is the (row number – 1) = 163

Output ID = IMPORTA-0163

and so on.

Once the spreadsheet is complete (no more records are to be inserted or deleted) it may be a good idea to ‘lock down’ the IDs by highlighing all the ID cells, copying them into the clipboard, and then doing a ‘Paste Special’ specifying to paste the actual values. This removes the formulae and makes the IDs static, but still unique of course.

To adapt for longer or shorter numbers, change the string “0000″ and the number that the row length is subtracted from (in this case 4). For records from 001 to 999, use "000" and 3-LEN(ROW(reference); for records from 00001 to 99999, use "00000" and 5-LEN(ROW(reference), and so on.

Inaccuracies reading Excel via ADO

Yesterday’s discovery with reading Excel via ADO and the ODBC Text Driver: numeric values may be wrong if it takes a guess that the column contains date fields.

The geeky detail (found in VB6SP5, Excel 2003, ADO 2.7, Jet Provider 4.0):

The text driver looks at the first few lines (configurable via the MaxScanRows setting or the Rows To Scan field in the DSN configuration box) to figure out what sort of data it is dealing with in each column. If there are numbers, it assumes the columns are numbers.

But if it sees dates, and sees no other data in the next few rows, then it assumes all the data in those columns are dates. If it gets further down to rows containing actual numbers, it still thinks it’s reading dates, and when it tries to convert the number to a date and back again, it causes a rounding error and ends up adding 1 to the final number.

In my case, the solution was to get the spreadsheets to contain zeroes (quite valid for the data being loaded) in the first few rows, instead of blanks. A little tricky, but it remains a good method for getting data out of Excel.

Office Object Library problems

One of the machines a program of mine runs on is still NT4 and Office 97. It seemed to keep working okay when I was on Win2K/Office 2000, but now I’m on WinXP/Office 2003 it’s crashing when calling the Excel object library to open an XLS.

MS have documented that this can happen after re-compiling with the Office 2000 libraries or later (though I’m sure it didn’t happen with Excel 2000).

“This behavior is by design.” Application errors by design. Yeah. SURE.

Excel application error

Apparently the solution is to use that normally-considered-evil late-binding; DIMming as an Object and using CreateObject.

But it didn’t seem to work for me. So I dug out an old copy of Office 97, installed it into a separate directory to Office 2003. Removed the reference to Excel 11, added Excel 8 instead (EXCEL8.OLB), recompiled and all is well.

Excel to VB bug?

From my limited testing (I had to find a workaround pronto) seems to be some kind of bug when using VB6, ADO 2.7, ODBC text driver to read from an Excel spreadsheet. Recordsets with date fields come in with the dates encoded as integers. This is normally no problem, as you can use IsDate to check if it’s valid, then CDate to convert it to a date.

But what I’ve found is that IsDate has stopped working… that is, it’s stopped returning True for field values that can be converted to dates, eg CDate works. Whether this is something to do with the new year, or a new version of something on my machine, I haven’t yet figured out, but I ended up writing a wrapper IsDate function that just tries the CDate regardless, and returns a False (eg not a date) if it errors.

An initial dig in the MSKB found nothing about it, but I’ll do some more exhaustive digging and try to get a definite answer later. The answer being, I suspect, “Switch to VB.Net, you luddite!”

Two Office snippets

I reckon it’s about time Word (and other Office products) figured out that those last few paragraph breaks at the ends of documents don’t need to be printed, especially if they cause an extra blank page to be wasted.

Yesterday I was trying to work out how to tell if an Excel cell has a value or a formula in it, so some conditional formatting could indicate if a value was calculated or entered by a human. Seems to be no built-in function to do it. But this seems to work, though I’m still trying to figure out how, precisely. The explanation at the bottom more-or-less explains it, but damn Excel 2003′s help, which has no index, doesn’t integrate its VBA help into VB6, seems to download everything (maybe it’s just the way I have it configured) and makes it near impossible to jump straight to the definition of GET and CELL.

Excel to HTML

I can’t believe how stupid Excel (2002/XP) was with the table of browsers the other day.

The plan was to get the numbers into Excel, copy/paste into a Frontpage table to strip back the formatting, then paste into WordPress.

Nup, bloody monstrous Excel tags right the way through it, which Frontpage couldn’t override, and evidently no easy way to strip. No combination of Paste Special would work. So for example, instead of <td></td> we got:

<td align="right" x:num="1.15E-2" style="color: windowtext; font-size: 10.0pt; font-weight: 400; font-style: normal; text-decoration: none; font-family: Arial; text-align: general; vertical-align: bottom; white-space: nowrap; border: medium none; padding-left: 1px; padding-right: 1px; padding-top: 1px"></td>

I kid you not. Now, I know about round-trip HTML, though I have my doubts that anybody uses it — firstly because it looks like crap in a web browser, and secondly because if you’ll want to edit it later, you’ll just keep an XLS copy. Besides, it’s badly implemented. The cell above was using the “Normal” style. It shouldn’t have had all the formatting crap embedded in it.

Word XP actually has a Save As Filtered HTML option to strip out all this crap. Excel XP doesn’t. (I haven’t checked Excel 2003 yet).

Plan 2 was to save it as HTML, load it into FrontPage and crop the HTML to paste into WordPress. Nup, trying to re-open it in FrontPage just threw it back to Excel. WTF?! Opening in UltraEdit (my preferred text editor) just revealed the same tags as above.

How can two Microsoft products that are part of the same suite, same version, operate so disastrously badly with one another, for something as simple as copying a table?

Plan 3? Oh bugger it, it’s only a few lines, just write it by hand.

If it were more I’d go install and run that clear The Useless Crap Out Of The HTML filter thing (oh look, they could do with clearing the crap out of their URLs too), but it refuses to install unless you have Office 2000. Wonderful.

Next time (after swearing a bit) I’ll probably save to CSV and then do a global replace from commas to table tags.

Surely there must be an easier way?