netfilter generated keepalive packets

Is it possible to have netfilter generate keepalive packets for connections? I have some proprietary windows services that communicate with each other through a few Linux routers, and I suspect that some connections are timing out (or being selected for purging from the connection tracking due to other more-active connections). If netfilter could generate a keepalive in each direction sometime before timing out the connection it would be great! Any other suggestion appreciated too. The WRT54GL router will be replaced before too long with something with a lot more memory which should resolve those problems but I need an interim solution. Thanks James

On Fri, 11 May 2012, James Harper wrote:
Is it possible to have netfilter generate keepalive packets for connections? I have some proprietary windows services that communicate with each other through a few Linux routers, and I suspect that some connections are timing out (or being selected for purging from the connection tracking due to other more-active connections).
Sounds like http://en.wikipedia.org/wiki/Transmission_Control_Protocol#Connection_hijack... to me. But even if you get the sequence right - the real sender will send one with the same sequence. That doesn't look good.. Regards Peter

On Fri, 11 May 2012, James Harper wrote:
Is it possible to have netfilter generate keepalive packets for connections? I have some proprietary windows services that communicate with each other through a few Linux routers, and I suspect that some connections are timing out (or being selected for purging from the connection tracking due to other more-active connections).
Sounds like
http://en.wikipedia.org/wiki/Transmission_Control_Protocol#Connection_hi jacking
to me.
But even if you get the sequence right - the real sender will send one with the same sequence. That doesn't look good..
AFAIK, keepalive packets is that they don't send any actual data so the sequence and ack numbers remain the same. James

On Fri, 11 May 2012, James Harper wrote:
On Fri, 11 May 2012, James Harper wrote:
Is it possible to have netfilter generate keepalive packets for connections? I have some proprietary windows services that communicate with each other through a few Linux routers, and I suspect that some connections are timing out (or being selected for purging from the connection tracking due to other more-active connections).
Sounds like
http://en.wikipedia.org/wiki/Transmission_Control_Protocol#Connection_hi jacking
to me.
But even if you get the sequence right - the real sender will send one with the same sequence. That doesn't look good..
AFAIK, keepalive packets is that they don't send any actual data so the sequence and ack numbers remain the same.
You are right that they re-use sequence numbers. http://www.pcvr.nl/tcpip/tcp_keep.htm Regards Peter

James Harper wrote:
Is it possible to have netfilter generate keepalive packets for connections?
AFAIK no. Do it in the app.
and I suspect that some connections are timing out (or being selected for purging from the connection tracking due to other more-active connections).
So test that hypothesis? Use conntrack(8) and/or the various status files in /proc and /sys.
Any other suggestion appreciated too. The WRT54GL router will be replaced before too long with something with a lot more memory which should resolve those problems but I need an interim solution.
I don't see why "more RAM" would fix this unless you've already increase the conntrack table limit to the physical limits of your RAM. IME any site running off a WRT54GL will not even exceed the default conntrack table size unless you're doing something pathological.

and I suspect that some connections are timing out (or being selected for purging from the connection tracking due to other more-active connections).
So test that hypothesis? Use conntrack(8) and/or the various status files in /proc and /sys.
I'm running on a Linksys WRT54GL with OpenWRT "WhiteRussian", so resources are a minimum and 'conntrack(8)' is a luxury I cannot afford (I haven't even checked if it's available but I won't have space) but certainly active connections are disappearing from /proc/net/ip_conntrack (but then they reappear again...) I did spot a "-m state --state INVALID -j DROP" default rule in openwrt which would mean that if the connection did fall off the end of the conntrack list any subsequent packets might be dropped... that rule has a hit count of 0 though even in cases where I know it's a problem so now I'm a bit confused. I've removed the -j DROP from the end anyway. Another curious thing... the connection that keeps disappearing from /proc/net/ip_conntrack is marked as ESTABLISHED but not marked as [ASSURED] which strikes me as strange.
Any other suggestion appreciated too. The WRT54GL router will be replaced before too long with something with a lot more memory which should resolve those problems but I need an interim solution.
I don't see why "more RAM" would fix this unless you've already increase the conntrack table limit to the physical limits of your RAM. IME any site running off a WRT54GL will not even exceed the default conntrack table size unless you're doing something pathological.
ip_conntrack_max was defaulting to around 5500 (a default based on 16MB memory I guess). That might seem high but this router runs at a library with 4 staff windows PC's, 4 public access windows PC's, and sometimes a high number of public access wireless devices, and I've since dropped the limit to 1024 as it's crashing regularly which I think is due to running out of memory. It has obviously been identified as underpowered but I need to coax it along for a little bit longer until the replacement has been proven as working (bugs in Linux/OpenWRT are giving me a headache at the moment). The one thing that is taxing it a bit is the per-ip rate limiting so that public and wireless devices get fast access for the first 10MB of data then get shaped heavily (token bucket filter with a big bucket), to discourage uses doing big downloads and destroying the experience for everyone else, but still allowing for a good browsing experience in the 'open page, read page, open another page' use case. It's currently sitting between 300kb and 800kb of free memory. Thanks James

James Harper wrote:
and I suspect that some connections are timing out (or being selected for purging from the connection tracking due to other more-active connections).
So test that hypothesis? Use conntrack(8) and/or the various status files in /proc and /sys.
but certainly active connections are disappearing from /proc/net/ip_conntrack (but then they reappear again...)
That's a good enough test.
Any other suggestion appreciated too. The WRT54GL router will be replaced before too long with something with a lot more memory which should resolve those problems but I need an interim solution.
I don't see why "more RAM" would fix this unless you've already increase the conntrack table limit to the physical limits of your RAM. IME any site running off a WRT54GL will not even exceed the default conntrack table size unless you're doing something pathological.
ip_conntrack_max was defaulting to around 5500 (a default based on 16MB memory I guess).
Hm; it hadn't occurred to me that it might be based on RAM size. I had assumed that was simply a hard-coded default in the kernel. On an Ubuntu 10.04 x86_64 router with 3½GB of RAM it's 64k. On an OpenWRT backfire / TP-1043ND (32MB volatile RAM) it's 16k.
That might seem high but this router runs at a library with 4 staff windows PC's, 4 public access windows PC's, and sometimes a high number of public access wireless devices
Shrug. For comparison, I currently have 46 active hosts behind my router and conntrack -C reports 2149 connections.
and I've since dropped the limit to 1024 as it's crashing regularly which I think is due to running out of memory.
OOM is not an unreasonable assumption, but I'm not convinced a 5k conntrack table would be enough to tip it over the edge. I defer to your evidence if you have hard numbers to the contrary.

On Fri, 11 May 2012, "Trent W. Buck" <trentbuck@gmail.com> wrote:
ip_conntrack_max was defaulting to around 5500 (a default based on 16MB memory I guess).
Hm; it hadn't occurred to me that it might be based on RAM size. I had assumed that was simply a hard-coded default in the kernel.
On an Ubuntu 10.04 x86_64 router with 3½GB of RAM it's 64k. On an OpenWRT backfire / TP-1043ND (32MB volatile RAM) it's 16k.
On a Squeeze system with 350M of RAM it's 21660 and on a Squeeze system with 512M of RAM it's 32100. On CentOS 5.7 with 384M of RAM it's 24512. But it's writable, so you should be able to increase it significantly if you need to. Even in the unlikely event that each connection took 1K of RAM then a 512M system could handle a lot more than 32100 entries! But I'm not sure that this would necessarily solve the problem. On my home network with a 512M system as the gateway I had problems with connection loss until I configured ssh (the only thing that holds long idle connections) to use ssh protocol keep alives (I never tested with TCP keep alives as I don't think they are useful for ssh - or any other protocol that supports checks). Even if Lenny had significantly smaller defaults than Squeeze (which seems quite unlikely as CentOS doesn't have small defaults) that wouldn't explain why a system that should be able to track thousands of connections starts failing when there are two client systems in use which aren't heavily loaded. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Russell Coker wrote:
On my home network with a 512M system as the gateway I had problems with connection loss until I configured ssh (the only thing that holds long idle connections) to use ssh protocol keep alives (I never tested with TCP keep alives as I don't think they are useful for ssh - or any other protocol that supports checks).
FWIW, here are the relevant notes from my .ssh/config: [...] Host * [...] # Perform keepalive pings at the SSH layer, not the TCP layer. # # russm> twb: TCPKeepalive is spoofable at the TCP layer, where ServerAliveFoo sends ssh ping commands inside the encrypted connection # twb> why does it matter if someone spoofs a keepalive? # russm> they could attack your routing and hijack the connection without you noticing that the other end had actually gone away # russm> whether that's a problem depends on what you're doing with the connection, of course # russm> if you're expecting asynchronous data to come back, then you'd never notice the other end was gone # russm> if you're doing interactive, or request/response then obviously you'd notice at a higher layer (perhaps the human layer) ServerAliveInterval 30 ServerAliveCountMax 10 TCPKeepAlive no

# russm> twb: TCPKeepalive is spoofable at the TCP layer, where ServerAliveFoo sends ssh ping commands inside the encrypted connection # twb> why does it matter if someone spoofs a keepalive? # russm> they could attack your routing and hijack the connection without you noticing that the other end had actually gone away # russm> whether that's a problem depends on what you're doing with the connection, of course # russm> if you're expecting asynchronous data to come back, then you'd never notice the other end was gone # russm> if you're doing interactive, or request/response then obviously you'd notice at a higher layer (perhaps the human layer)
I don't buy that. keepalives don't necessarily make TCP hijacking any more or less possible, and would be useless for SSH unless the attacker also knows the current state of the encryption state machine, and if they know that then you are looking for the problem in the wrong place. I'm open to being enlightened though! James

James Harper wrote:
# russm> twb: TCPKeepalive is spoofable at the TCP layer, where ServerAliveFoo sends ssh ping commands inside the encrypted connection # twb> why does it matter if someone spoofs a keepalive? # russm> they could attack your routing and hijack the connection without you noticing that the other end had actually gone away # russm> whether that's a problem depends on what you're doing with the connection, of course # russm> if you're expecting asynchronous data to come back, then you'd never notice the other end was gone # russm> if you're doing interactive, or request/response then obviously you'd notice at a higher layer (perhaps the human layer)
I don't buy that. keepalives don't necessarily make TCP hijacking any more or less possible, and would be useless for SSH unless the attacker also knows the current state of the encryption state machine, and if they know that then you are looking for the problem in the wrong place.
I'm open to being enlightened though!
I don't pretend to grok the commentary above :-( It was convincing enough for me to cargo-cult into my ssh_config.
participants (4)
-
James Harper
-
Peter.Ross@bogen.in-berlin.de
-
Russell Coker
-
Trent W. Buck