Having recently noticed some odd behavior with Apache 2.2.4 when examining my apache logs. I noticed an odd number of HTTP 400 errors. Surprisingly, these errors were being generated by my own server - i.e. the IP address of my apache server was issuing the request.
How odd.
The first thing which crossed my mind was someone was trying (or had) exploited my server. I was satisfied that everything was fine; I had recently updated my LAMP setup and checked against rootkits and known exploits. I was annoyed; it was time to try and trace this condition.
After checking against exploits, the first thing to do was check and recheck the apache config files. Using a large number of rewrite rules and custom log format, those were the obvious initial candidates.
My custom log format:
"%h %{%d/%m/%Y %H:%M:%S %z}t %D \"%r\" %>s %I %O \"%{Referer}i\" \"%{User-agent}i\" %{Location}o"
Normal requests show up in the log as follows:
74.8.28.132 21/08/2007 04:42:17 +0100 172 "GET /sawblog/ HTTP/1.0" 200 563 32500 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv;1.8.1.4) Geck/20070603 Fedora/2.0.0.4-2.fc7 Firefox.2.0.0.4" http://www.plope.com/
These Bad Requests look like follows:
64.34.177.88 21/08/2007 04:42:17 +0100 122 "GET /" 400 11 550 "-" "-" -
Since the first thought was the RewriteRules were causing this; I tried to provoke the bad request myself. Crawling various parts of my site, I’d hoped I would find a bad Rewrite rule. Life (and computers) are never so easy. I found I was unable to provoke the condition. I began to think I wasn’t even on the right track.
What else could it be? Monitoring (nagios)? I’d used nagios for years and I’d never seen this problem. It couldn’t see how that could be the cause. I could see the nagios request being logged with the expected 200 response in the logs. Not worth further consideration.
I had recently began playing withcacti and graphing again. This did not make much sense either; all graphing was being done by external scripts I wrote - and I wasn’t hitting the http server directly. Logic continued to point to a bad rewrite somehow casing a request to be proxied incorrectly to the local apache server instead of one of my back-end applications. This made sense to be the culprit. Go with it.
As a last resort I figured I would jump on #apache on irc and see if anyone there might have seen this or confirm it could be a Rewrite rule causing it. Unfortunately the Apache IRC channel didn’t offer much help. Outside the obvious, no one offered up any insight that I couldn’t deduce on my own. I had hoped someone had experienced this condition or knew if a fudged RewriteRule could be the cause. The general consensus on #apache was it would be awfully difficult to fudge a rewrite rule to cause what I was seeing. Unfortunately, #apache wasn’t much use.
The condition proved to be more of an annoyance then an actual problem (no URL I could find actually resulted in an error raised to users). I just noodled over the issue for awhile. Over this past weekend, I decided I would crawl the site and then ramp up some load against server. Just see if I could provoke the condition once again.
Pointing a load utility against the site and ramping up the load was unable to provoke it (at first). Running for a substantial period of time, there were no 400 errors being logged. Frustrated, I quit my load test. Low and behold, nearly immediately I had logged several bad requests. How could that be? At least, I had something to go on: Quiting my load test in mid-stream resulted in a number of bad requests.
Interesting.
With that in mind, I reran some load and quit the test, duplicating the bad requests being logged. Excellent, reproducible condition. Like anyone, I loathe the unreproducible error. Lord knows, I get enough of those from the users of my apps….
I have apache running the prefork MPM module. So, what appears to be happening is during load, apache increases the number of servers to handle requests. As soon as the load subsides, apache kills the spare servers to the MaxSpareServers threshold: 30 - as Apache kills of the unnecessary servers, the Bad Request log entry shows up. Bingo.
Currently I’m trying to find out if this is a bug in apache, but as of yet I’m not finding anything insightful. Hopefully someone who finds themselves confronted with this condition won’t spin as many wheels as I.