
and I suspect that some connections are timing out (or being selected for purging from the connection tracking due to other more-active connections).
So test that hypothesis? Use conntrack(8) and/or the various status files in /proc and /sys.
I'm running on a Linksys WRT54GL with OpenWRT "WhiteRussian", so resources are a minimum and 'conntrack(8)' is a luxury I cannot afford (I haven't even checked if it's available but I won't have space) but certainly active connections are disappearing from /proc/net/ip_conntrack (but then they reappear again...) I did spot a "-m state --state INVALID -j DROP" default rule in openwrt which would mean that if the connection did fall off the end of the conntrack list any subsequent packets might be dropped... that rule has a hit count of 0 though even in cases where I know it's a problem so now I'm a bit confused. I've removed the -j DROP from the end anyway. Another curious thing... the connection that keeps disappearing from /proc/net/ip_conntrack is marked as ESTABLISHED but not marked as [ASSURED] which strikes me as strange.
Any other suggestion appreciated too. The WRT54GL router will be replaced before too long with something with a lot more memory which should resolve those problems but I need an interim solution.
I don't see why "more RAM" would fix this unless you've already increase the conntrack table limit to the physical limits of your RAM. IME any site running off a WRT54GL will not even exceed the default conntrack table size unless you're doing something pathological.
ip_conntrack_max was defaulting to around 5500 (a default based on 16MB memory I guess). That might seem high but this router runs at a library with 4 staff windows PC's, 4 public access windows PC's, and sometimes a high number of public access wireless devices, and I've since dropped the limit to 1024 as it's crashing regularly which I think is due to running out of memory. It has obviously been identified as underpowered but I need to coax it along for a little bit longer until the replacement has been proven as working (bugs in Linux/OpenWRT are giving me a headache at the moment). The one thing that is taxing it a bit is the per-ip rate limiting so that public and wireless devices get fast access for the first 10MB of data then get shaped heavily (token bucket filter with a big bucket), to discourage uses doing big downloads and destroying the experience for everyone else, but still allowing for a good browsing experience in the 'open page, read page, open another page' use case. It's currently sitting between 300kb and 800kb of free memory. Thanks James