Category Archives: CMS

Hand-written comment spam

Amongst all the easy-to-spot robot comment spam, I’m getting a bunch that (at first glance) looks like it’s written by humans. Gone are the stupid out-of-context broken-English comments and links to drug sales. These all have comments that look like they’ve got a few milliseconds’ thought put into them, all on new posts, they all leave a rediffmail (Indian GMail-type operation) address, a 209.97. IP address, and a link to a web site featuring lots of links and no content.

So far I’ve been spiteful and kept the comments but wiped the URL link.

I wonder if they’re particularly targetting WordPress sites that haven’t yet been upgraded to use the NoFollow links.

MySql woes

We’ve got MySql problems here at Geekrant central.

MySQL said: Documentation
#1016 – Can’t open file: ‘wp_comments.MYI’ (errno: 145)

Doesn’t sound good, does it? The ISP is looking into it.

Nothing else seems to be AWOL, but I’ve taken a backup of everything just in case. Wouldn’t you know it, the backup I have of wp_comments isn’t particularly recent. Hopefully the ISP has a newer one, but if not, I’ve grabbed a bunch of comments via Newsgator’s cache. Gawd knows how I’d restore them though.

Update: Fixed. May I just say, the support guys at AussieHQ hosting are deadset legends.

WordPress Most comments

Here’s the SQL to find which of your WordPress posts have the most comments:

SELECT wp_comments.comment_post_id, count(*) as commentcount, wp_posts.post_title, wp_posts.post_date FROM wp_comments, wp_posts
where wp_comments.comment_post_id = wp_posts.ID
group by wp_comments.comment_post_id, wp_posts.post_title, wp_posts.post_date
order by commentcount desc

29th of February exists in WordPress (almost)

I discovered the other week that if you put an illegal post date into WordPress, such as 29-Feb-2006, it displays as the next day, 1-Mar-2006 on the page, but doesn’t allow commenting or going to the permalink, because in the database it’s still there as 29-Feb, so it doesn’t show up if you try to click through to it.

I suspect it’s just PHP’s date handlers being helpful, so it may show up in other PHP-based software.

Bloglines no like

The otherwise very fine Bloglines RSS aggregator isn’t liking this site very much, reporting errors when trying to add the Geekrant RSS feed which works so well for most other people.

Bloglines error

Uh, yes the feed does exist. Those who like XML can look at it raw.

It’s doing the same for some other blogs, including my personal one, having rejected it since late December. Very odd. And I’m not the only one.

I’ve contacted Bloglines support, so hopefully they’ll be looking into it.

WordPress 2.0 is coming soon

WordPress junkies may be interested to hear that the WordPress 2.0 Release Candidate is out, with the real release expected to be only days away. From the sounds of it there’s a heap of cool new features in it, though much of it is under-the-hood changes that will affect developers more than anybody else.

One of my summer holiday projects is to upgrade all my WordPress installations. I’ll take a look at 2.0, but of course I’m always wary of jumping straight into major new releases, especially since 1.5.2 is incredibly stable.

Seeing a new server before re-delegation

One of the weaknesses of WordPress and most other web-configured applications is that unless you want to go SQL or config-file-wrangling, it’s pretty much only configurable via the web, at least for tweaking, importing posts, setting up most of the options. This is a problem when, for instance, you’re migrating an existing site onto WP, and it’s on a new server, as you can’t get to the wp-admin screens.

The way to do it is to hack your hosts file. Once the new server is running and WP is setup on it, find your hosts file and add an entry to the new server. On Windows, this is the c:\windows\system32\drivers\etc\hosts file.

Chuck in a line that says contains your new server’s IP address, and the hostname. Something like:

192.168.0.1 www.evision.com.au

(Whoopsie, real-world example with a fake IP. The new evision site is going live Real Soon Now.)

Save, then away you go. You can see the new site and tweak to your heart’s content, but nobody else will be able to see any of it until you re-delegate.

The catch? It probably won’t work from behind corporate networks, where your computer uses a proxy.

WordPress’s best defence against the dark arts of spam

Scoble writes that WordPress.com has strong comment spam protection, but that it sometimes gets false positives.

I’ve found nothing better for spam protection than WP-Hashcash, which uses Javascript to make sure it’s a human entering the comment, not a robot, but without captchas or other stuff the user has to do. Works like a dream.

The only down side is it doesn’t work with some older WP templates. So while this site is fully spam equipped, my personal blog won’t run it until I upgrade the template (probably a project for Christmas time).

But apart from that, for WPers out there, I can’t recommend it highly enough.

Combined with settings that ensure firsttime posters go straight to moderation (subsequent postings are approved automatically) it ensures that those damn spammers never get their comments published on my site.

I might add that the company I work for (which develops B2B messaging systems) is working on a new site. To encourage them to update it regularly (some might call it blogging, but I’m emphasising “regular updates to existing and potential customers”) I’m building it on WordPress. Given WP’s ability to do a site of static pages and dated entries, it should work very well.

The Age’s new layout problems

The Age and SMH recently launched a new layout, which includes splitting articles across pages. They must have heard the criticism over this, because articles now include a link to view all of the text on a single page.

But there’s still problems with it. Examples:

Age advert problemThis article ended up with no text at all on page 3; just an advert. Evidently a few carriage returns got tacked onto the end of it.

Age advert problemThis article ended up with no visible text at all, and the adverts hiding underneath other story links (at least in Firefox). (via Tom N)

Age advert problemAnd this story, about Australian Nguyen Tuong Van’s impending execution in Singapore has as its advert a Qantas promotion including cheap seats to Singapore. The same ad runs with a similar story on the SMH. (via Tony)

Not good.

Update 10am: This article also features the ad for Qantas cheap fares to Singapore.

Patches for Win2K and WP

If you’ve been holding off patching your Windows 2000 boxes with the latest security updates, do it now, because the Zotob worm is spreading fast. Thankfully it only affects Win2K, and anybody who’s already patched with MS05-039 is already protected.

Also new this week is WordPress 1.5.2. I’ve used WP for a while now, but am now dabbling with it for a company site… it’s increasingly impressive, especially for CMS/Non-dated pages work.

PS: According to a report, car-maker General Motors Holden has lost A$6 million in car production due to the Zotob worm. Other major companies have also been hit.

Stopping WordPress spammers

The blog comment/trackback anti-spam refinement continues.

I’m testing the WP-Hashcash plugin, which inserts Javascript code to calculate an authorisation code into the comment. Since comment spammers don’t actually use the comment forms (at least I hope not; not until they start using people to enter the comments), this means only real comments get through. Well, real comments from people with Javascript running. If they don’t have Javascript running, they may be out of luck. Hopefully that applies to nobody these days, and I think this solution is less painful than a captcha-based one.

But trackback spam is still a problem. One available option is to block direct access to the WordPress trackback PHP, but this isn’t very effective, since most current trackback spammers however are clever enough to call the “real” URL.

A version of Auto shutoff comments modified to close trackbacks on posts older than 28 days, however, seems more effective. I don’t particularly want to shut comments off (especially since the above plugin effectively stops comment spam), but trackbacks are less compelling to keep open.

Together with previously discussed .htaccess entries to block big bandwidth thieves, this appears to be a fairly effective set of anti-blog spam measures. For now.

Pirates! Spammers! Gyroscopes! Bandwidth thieves!

This is officially getting ridiculous. Not only are my blogs getting a lot of comment spam, but my personal blog site is burning huge amounts of bandwidth, as particular (I assume zombie) hosts hit the site.

Below are the top ten bandwidth users of danielbowen.com for June:

Top 10 of 15312 Total Sites By KBytes
# Hits Files KBytes Visits Hostname
1 14380 4.10% 3801 1.77% 111235 2.22% 159 0.24% host-148-244-150-58.block.alestra.net.mx
2 17558 5.01% 3191 1.48% 99441 1.98% 157 0.24% host-207-248-240-119.block.alestra.net.mx
3 3927 1.12% 3640 1.69% 75989 1.51% 3 0.00% csr010.goo.ne.jp
4 3062 0.87% 2797 1.30% 74881 1.49% 171 0.26% rrcs-24-97-174-130.nys.biz.rr.com
5 3057 0.87% 2200 1.02% 62547 1.25% 392 0.60% msnbot.msn.com
6 2691 0.77% 2248 1.04% 60684 1.21% 153 0.23% 64.124.85.78.become.com
7 2256 0.64% 2082 0.97% 56383 1.12% 124 0.19% 98-101-196-200.linkexpress.com.br
8 2146 0.61% 2033 0.94% 51665 1.03% 279 0.43% dsl-250-198.monet.no
9 2001 0.57% 1755 0.82% 47605 0.95% 23 0.04% host133.sprintnetops.net
10 1686 0.48% 1571 0.73% 35979 0.72% 325 0.50% corporativos

It’s not like this site is hosting pr0n or something — there’s just no reason why any single host would need to grab 110Mb of traffic in a single month. In total traffic topped 4Gb for the month, which is ludicrous for a diary site with a few photos on it. 4Gb is actually my monthly limit — thankfully my web ISP isn’t too strict about charging extra for hitting that, but there’s always the risk if this is consistent that it’ll be costing me real money.

As a result I’ve started a list of bandwidth hogs’ IP addresses, which I’m putting in the .htaccess file. Anything with lots of hits and grabbing above about 5Mb per month is going onto the list, and the list is being duplicated (manually unfortunately) across to the other WordPress sites that I run.

Inspection of the access_log is particularly enlightening, with at present a staggering number of requests coming in with a referer at poker-related sites. Of the 6665 hits in the file for today (covering about 13 hours) there are 674 from texasholdemcenteral.com (note the wonky spelling) and 1212 from sportscribe.com. All of these too are now being blocked with a 403 (forbidden) via .htaccess.

Sigh. I suppose it’s just too much to expect people to place nice?

.htaccess extract – Feel free to copy for your own site to block miscreants.
Continue reading