
On Wed, 27 Nov 2013, Toby Corkindale wrote:
I do advise limiting the maximum number of threads per machine, to some amount lower than where you see the problems occurring. Set up a benchmarking rig against one machine (that's taken out of the load-balancing pool) and find out where the optimum amount is -- I'm pretty sure you'll find it's lower than at the many-thousand-processes mark. It's better to allow connections to pile up in the "pending" queue, but be able to process them quickly, than to accept them all and then serve them all very slowly or not at all.
ListenBacklog: https://sites.google.com/site/beingroot/articles/apache/socket-backlog-tunin... 512 by default (according to that page, but http://<server>/server-info doesn't list the current value), but the kernel in its current config only allows 128 per port. Aha, we'll change that next week (I actually thought that when it got to MaxClients, ie, number of slots filled up, it didn't accept any new connections at all. At least, in practice, we find that once it hits maxclients, the servers start dropping connections very soon after (and this propagates through the load balancers). That would be explained by the kernel limit being to only allow an extra 10% above the current number of max slots. Easy to fix.
Secondly, if you're really stuck with Apache, and can't put decent reverse proxy accelerators in front of them, then try switching over the event-based worker? http://httpd.apache.org/docs/current/mod/event.html
That would be good, but rhel5&6 are still on apache 2.2, and event is marked as experimental there :( http://httpd.apache.org/docs/2.2/mod/event.html We'll be stuck on rhel5 in production for years to come up at the current rate. I wonder about worker vs prefork? linux processes are lightweight, so I don't imagine threading is going to be much better. We only fork one process per second typically, and I don't think there'll be many differences in context switch overhead between the two. Worker apparently "sucks for php", but I don't know whether that's for mod_php or cgi or whatever. I like the sound of mod_pagespeed: https://www.digitalocean.com/community/articles/how-to-get-started-with-mod_... but the risk of rewriting stuff on the fly won't be accepted for most of our website. Hey, we just rediscovered a longstanding problem in that the most common static image (5 million hits in an hour, image hasn't changed in a year) on the site was in a directory that was marked as non cacheable! Whee! -- Tim Connors Midrange Systems | ITB | Bureau of Meteorology Phone: (03) 9669 4208 | E-mail: T.Connors@bom.gov.au