IP gurus - TCP route cache clobbers IPsec xfrm

I'm load testing a kernel mode IPsec setup. When IKE SAs renegotiate one or two IP packets get sent on the default route but after the tunnel is back up the TCP connection remains stuck on the default route by-passing xfrm policy until the TCP connection closes from retransmission failures. Ping is not affected the same way. It uses the tunnel as soon as a new child SA is up (even while TCP is broken). The next TCP connection uses the tunnel immediately. What can I do to fix this? Things I've tried that did not work: - conntrack -D - iptables -t raw ... -j NOTRACK # have verified the connections never appear in conntrack state - ip route cache flush # while a connection is stuck on the wrong path Kernel is 2.6.32. The TCP connections terminate on a loopback of the IPsec gateway. Ideas?

Richard Andrews wrote:
I'm load testing a kernel mode IPsec setup. When IKE SAs renegotiate one or two IP packets get sent on the default route but after the tunnel is back up the TCP connection remains stuck on the default route by-passing xfrm policy until the TCP connection closes from retransmission failures.
Ping is not affected the same way. It uses the tunnel as soon as a new child SA is up (even while TCP is broken). The next TCP connection uses the tunnel immediately.
Random not-very-good thoughts: - presumably ICMP works because it's connectionless. Confirm by testing if UDP also does the Right Thing. - RPF? During transition, perhaps RPF is dropping a small segment of the TCP conversation. Test by turning off RPF. - likewise test w/ minimal or no firewall, policy routing, &c tcpdump on all hosts you have access to

On Mon, May 21, 2012 at 12:17 PM, Trent W. Buck <trentbuck@gmail.com> wrote:
Richard Andrews wrote:
I'm load testing a kernel mode IPsec setup. When IKE SAs renegotiate one or two IP packets get sent on the default route but after the tunnel is back up the TCP connection remains stuck on the default route by-passing xfrm policy until the TCP connection closes from retransmission failures.
Ping is not affected the same way. It uses the tunnel as soon as a new child SA is up (even while TCP is broken). The next TCP connection uses the tunnel immediately.
Random not-very-good thoughts:
- presumably ICMP works because it's connectionless. Confirm by testing if UDP also does the Right Thing.
TCP also does the right thing provided it does not terminate on the gateway. IP forwarding works as expected, which is good as this is the main requirement. I think it must be the linux TCP stack binding the connection to a path based on where the first packet goes.
- RPF? During transition, perhaps RPF is dropping a small segment of the TCP conversation. Test by turning off RPF.
Interesting idea. I can try, but I expect RPF to be connectionless as it operates at layer 3.
- likewise test w/ minimal or no firewall, policy routing, &c
Yup. All off. It's a test system so I can turn off everything.
tcpdump on all hosts you have access to
participants (2)
-
Richard Andrews
-
Trent W. Buck