
On 27 November 2013 13:06, Tim Connors <tconnors@rather.puzzling.org> wrote: [snip]
Three to six thousand slots sounds like a lot for one machine, to me.* I wondered why so many? Are you not running a reverse-proxy accelerator in front of Apache? (eg. Varnish or some configurations of nginx)
We're government. Let's throw resources at it (until we run out of money) rather than think about it carefully (actually, it's served us pretty well up til now. But someone made Wise Choices last year, and then <rant elided>).
If you were just serving static content directly, I'd go with something lighter-weight than Apache; and if you're serving dynamic content (ie. the php you mention) then I'd definitely not do so without a good reverse-proxy in front of it, and a much-reduced number of apache threads.
The php wasn't a big thing until a few months ago. It's obvious that it's causing the problems, but chucking a cache in front of each node will be impossible now that we're not allowed to buy any equipment or replacements for so-old-they're-out-of-warranty machines (annoyingly, the offending .php page could just as easily be a static page, but we outsourced that to an "industry expert"). The httpd.conf configuration is complex enough that it'll never be replaced with another httpd server particularly now that the only two people who knew enough about it in the web group have retired.
If you were just serving static content before, then Apache (w/sendfile) is fairly efficient and can handle a lot of simultaneous connections. I really wouldn't do it myself for a busy site, and it's susceptible to a few problems, but it's something you can mostly get away with. As soon as you throw dynamically-generated stuff in (ie. CGI of any sort) all that changes. You see, Apache can use a kernel-extension (sendfile) to attach an open filehandle (to static content) directly to a socket, and then the kernel just handles the rest, so it uses very little memory or CPU time or context switching. But dynamic content involves spawning an addition process, doing a whole lot of memory allocation, lots of file i/o and system calls, and worse: that heavy execution environment has to stick around as long as it takes to send the results off to a client, gradually feeding in a few kilobytes at a time. If your client is on a slow connection (mobile, dial-up, busy adsl, DoS attack) then you're holding up a lot of resources for a long time. Again, you are probably aware of this -- but I'm trying to illustrate just *how much* heavier the PHP processes are compared to serving static content. I'm not particularly surprised that servers which could handle things fine w/static content are falling over now. I do advise limiting the maximum number of threads per machine, to some amount lower than where you see the problems occurring. Set up a benchmarking rig against one machine (that's taken out of the load-balancing pool) and find out where the optimum amount is -- I'm pretty sure you'll find it's lower than at the many-thousand-processes mark. It's better to allow connections to pile up in the "pending" queue, but be able to process them quickly, than to accept them all and then serve them all very slowly or not at all. Secondly, if you're really stuck with Apache, and can't put decent reverse proxy accelerators in front of them, then try switching over the event-based worker? http://httpd.apache.org/docs/current/mod/event.html Toby