Using Atomz free search with WordPress

I’ve set up the Atomz free search to index both my old site toxiccustard.com and my personal blog at danielbowen.com together. Atomz allows you to specify multiple entry points for its crawler, putting all the specified sites into the one index.

Given the free search only allows 750 documents in its index, the catch with WordPress is to avoid it indexing individual blog entries, but doing the monthly pages instead. This is done using the URL Masks feature, so for instance with my blog structure of danielbowen.com/year/month/day/entry-slug I specify

exclude regexp http://www.danielbowen.com/..../../../*

The other ones I’ve excluded are RSS feeds (which it chokes on, and wastes processing time on), comments and category URLs.

exclude http://www.danielbowen.com/category/*
exclude http://www.danielbowen.com/comments/*
exclude regexp http://www.danielbowen.com/*/feed/*

This keeps my current total number of pages (both domains together) down to 519, which is pretty good, and well under the 750 limit for the freebie version.

It’s also handy in that the crawler logs broken links. I’ve got quite a few that have shown up as I move my old blog archives into WordPress, so I can just work through the list and fix them.