Last week I realized that almost all the links and inbound traffic from Google on my website were resulting in a 404: Page not found. The reason was straight-forward, all my old posts weren’t there!
So over the weekend I decided to migrate all my previous posts to WordPress. I was using my own CMS before and it didn’t provide a mean to export posts easily. Which means, I have to connect to the database directly to pull all my previous posts.
To make things worse, my previous website used the sub-domain www3.fuzedbulb.com for reasons I don’t currently recall. I am pretty sure it was some weird-ass DNS experiment I might have been doing at some point in time. Anyhow, in addition to that they were relying heavily on apache’s mod_rewrite rules to make permalinks. It was typically of this format:
or even preserving the page number you’re on:
Still it was trivial, I wrote a small Java application to connect to the MySQL database via JDBC and use WordPress XMLRPC API to publish posts (the same publishing script is also used for TeaBreak syndication). Anyhow, after a couple of hours – all my previous posts have been imported and old permalinks are functioning again.
Erm. Well, old permalinks are sort of working. I could have done a better job there by keeping old permalink to new permalink mapping but hey I was lazy and it was a weekend. What it does is, it tries to extract keywords out from the old permalink and redirects to WordPress search with those keywords. I know, I know, its a very lazy way to do it! The solution is not perfect, but atleast its better than showing a 404 page. The idea was to try and narrow down posts that are likely to be associated with the old permalink.
Now, I am just buckling up for the huge block of traffic that Google will resume sending my way!
Things I learnt:
- For each post comment that I was migrating via XML-RPC API, WordPress was sending me email notifications. I had to stop the script and disable notifications as I was spamming myself.
- WordPress has rate-limiting for comments (“speed of posting comments”) feature based on your IP address. I wasn’t sure if that was going to effect me as I was going to post a lot of comments. I added a random sleep interval in between just for the safe side (also, I think it’s server-friendly as well to do it this way), but turns out the rate-limit doesn’t kick in when you’re using XML-RPC.
- Permalinks / using apache mod-rewrite is good but migrating and keeping them functioning in systems that have a different permalink model is a bit of a pain.
- Had to be extra cautious when dealing with uploaded files and relative paths. My old system uploaded files in a different location (/userfiles) than WordPress (/wp-content/uploads). I fixed it with some symlinks magic, something like:
ln -s /sites/fuzedbulb/wp-content/uploads/userfiles/ userfiles
Finally, FuzedBulb has all its old content now (going all the way back to 2003). Cool!