Recently (since the move to a new server a few months ago), Somethinkodd.com has been down for a couple of minutes, many many times.
If you visit the site during the downtime, you get a message from my web-host ISP explaining that the site has exceeded its 20% CPU quota, and hence has been taken off-line for a few minutes.
It only seems to happen when I am working on my site (However, it occurred to me that the problem might be frequently occurring when I am not working on my site, but I hadn’t noticed, because I wasn’t working on my site at the time…)
I originally guessed that the problem might be related to the use of Unison, and I was going to get around to work out how to “nice” it.
Today, the problem occurred when I hadn’t used Unison, and I realised my original assumption was wrong. Checking the logs, I found that just prior to the downtime, the Atom feed on one of my old test blogs was being hit moderately hard – 300 times in 40 seconds.
I looked up the IP address to find out who it was who was peeking at my (unpublished) test blog. Uh oh, I recognize that domain name. That’s my web-server.
At this stage, it seems that there is a nasty denial-of-service bug in WordPress’s (pretend) cron jobs. I almost certainly have a misconfiguration somewhere (Let’s be clear about that; I have treated my test blogs harshly, and migrated them around several servers and URLs. That’s why I have them.) A scheduled task is trying to fetch a non-existent page, which is returning a 404 error. Either that is causing it to simply try again or (perhaps more likely) it is being added to a list of incomplete jobs that get tried again in the next cron job run.
Hopefully, I will nail the problem soon. Let me know if you see any downtime.
Comment by Dave on July 14, 2007
Yay ! a Fibonacci bug !!!