Linux PPPoE bridge mode issues - again

Hi, I posted last year about a problem I was having with Linux's PPPoE functionality in regards to a specific modem. At the time I put it down to a dodgy modem and moved on, but now I've hit it on another modem, and twice seems more than coincidence. The problem is that path MTU detection seem to break when the "bad" modems are involved. So the Linux box running pppoe is OK, because it knows the interface has an mtu+mru of 1492, but masqueraded clients do not. You can work around the problem a bit, by having an iptables rule with --clamp-mss-to-pmtu, but it's a kludge.. and importantly, only required for two of these four modems. The other two work just fine *with apparently identical configurations* (ie. LLC / bridged) Can anyone think of a reason for this? The rp-pppoe and other mailing lists offer tantalising hints that I'm not alone, but sadly those threads do not lead to any solutions. Modems that work: * TP-Link TD-8817 (Trendchip chipset) * Billion 7300RA (Trendchip chipset) Modems that don't work: * TP-Link TD-8840T (Trendchip) * Billion 7800NL (Broadcom chipset)

Hi, I posted last year about a problem I was having with Linux's PPPoE functionality in regards to a specific modem. At the time I put it down to a dodgy modem and moved on, but now I've hit it on another modem, and twice seems more than coincidence.
The problem is that path MTU detection seem to break when the "bad" modems are involved. So the Linux box running pppoe is OK, because it knows the interface has an mtu+mru of 1492, but masqueraded clients do not. You can work around the problem a bit, by having an iptables rule with --clamp-mss-to-pmtu, but it's a kludge.. and importantly, only required for two of these four modems. The other two work just fine *with apparently identical configurations* (ie. LLC / bridged)
Can anyone think of a reason for this? The rp-pppoe and other mailing lists offer tantalising hints that I'm not alone, but sadly those threads do not lead to any solutions.
Modems that work: * TP-Link TD-8817 (Trendchip chipset) * Billion 7300RA (Trendchip chipset)
Modems that don't work: * TP-Link TD-8840T (Trendchip) * Billion 7800NL (Broadcom chipset)
In which direction "doesn't work for masq clients"? For sending (client -> internet), the ICMP Fragmentation Required packet should get sent from the ppp endpoint (eg your linux router). Pinging with 'do not fragment' packets from your masq clients should confirm that this is working. For receiving (internet -> client), something upstream of the LT2P endpoint has to send the ICMP Frag Required packet to the sender of the packet. It can't be (directly) the fault of your modem. What could upset things though is if the ppp mtu and mru settings aren't getting passed to the other end. So if your end has an mtu of 1492, your router won't allow larger packets through and should (easily testable) send the frag required packet back, but if the other end didn't know that your mtu was set thus, because you hadn't specified an mru and/or because (maybe) the modem wasn't doing some magic somewhere to let the other end know what sized packet you could receive, then you would have problems. Are you definitely setting both mtu and mru in your ppp config? Can you confirm that your router is behaving correctly for your clients? Eg on your clients do "ping -M do -c 1 -s 1472 8.8.8.8". Then "ping -M do -c 1 -s 1464 8.8.8.8". The first should give you a fragmentation required response. The second should not (don't know if google dns answers pings. Now can do you the same from an external IP address (with 1500 MTU - not another PPPoE endpoint)? Email me with your external IP privately and I can test this for you if you don't have such access. The other possibility is that somewhere between you and the LNS is an even lower MTU restriction, so you'd need to set your mtu and mru to something lower. I would expect you'd have problems with the host in that case though. One further possibility is that even in bridge mode, some of these modems are snooping your packets and setting the MSS for you. That would be easy enough to test too by making a connection to something running wireshark. (can help you with that too if you want) All that said though, you still do need to set the MSS. There are too many broken routers out there. James

On 16 April 2014 13:00, James Harper <james.harper@bendigoit.com.au> wrote:
Hi, I posted last year about a problem I was having with Linux's PPPoE functionality in regards to a specific modem. At the time I put it down to a dodgy modem and moved on, but now I've hit it on another modem, and twice seems more than coincidence.
The problem is that path MTU detection seem to break when the "bad" modems are involved. So the Linux box running pppoe is OK, because it knows the interface has an mtu+mru of 1492, but masqueraded clients do not. You can work around the problem a bit, by having an iptables rule with --clamp-mss-to-pmtu, but it's a kludge.. and importantly, only required for two of these four modems. The other two work just fine *with apparently identical configurations* (ie. LLC / bridged)
Can anyone think of a reason for this? The rp-pppoe and other mailing lists offer tantalising hints that I'm not alone, but sadly those threads do not lead to any solutions.
Modems that work: * TP-Link TD-8817 (Trendchip chipset) * Billion 7300RA (Trendchip chipset)
Modems that don't work: * TP-Link TD-8840T (Trendchip) * Billion 7800NL (Broadcom chipset)
In which direction "doesn't work for masq clients"?
For sending (client -> internet), the ICMP Fragmentation Required packet should get sent from the ppp endpoint (eg your linux router). Pinging with 'do not fragment' packets from your masq clients should confirm that this is working.
For receiving (internet -> client), something upstream of the LT2P endpoint has to send the ICMP Frag Required packet to the sender of the packet. It can't be (directly) the fault of your modem.
What could upset things though is if the ppp mtu and mru settings aren't getting passed to the other end. So if your end has an mtu of 1492, your router won't allow larger packets through and should (easily testable) send the frag required packet back, but if the other end didn't know that your mtu was set thus, because you hadn't specified an mru and/or because (maybe) the modem wasn't doing some magic somewhere to let the other end know what sized packet you could receive, then you would have problems.
Are you definitely setting both mtu and mru in your ppp config?
Can you confirm that your router is behaving correctly for your clients? Eg on your clients do "ping -M do -c 1 -s 1472 8.8.8.8". Then "ping -M do -c 1 -s 1464 8.8.8.8". The first should give you a fragmentation required response. The second should not (don't know if google dns answers pings.
Now can do you the same from an external IP address (with 1500 MTU - not another PPPoE endpoint)? Email me with your external IP privately and I can test this for you if you don't have such access.
The other possibility is that somewhere between you and the LNS is an even lower MTU restriction, so you'd need to set your mtu and mru to something lower. I would expect you'd have problems with the host in that case though.
One further possibility is that even in bridge mode, some of these modems are snooping your packets and setting the MSS for you. That would be easy enough to test too by making a connection to something running wireshark. (can help you with that too if you want)
All that said though, you still do need to set the MSS. There are too many broken routers out there.
James
Hi James, I am explicitly setting mtu and mru to 1492 in the pppd config. I don't have any of the faulty modems connected right now, so can't check until tonight -- but last time I brought this up on list, I checked and said "If I'm reading the tcpdump correctly (see reply to myself with it this morning) then yes, there is a frag-required response." I will re-confirm this tonight. So outbound (masq clients to internet via linux router) appeared to be doing the right thing. Last time this came up, in 2013, you checked the inbound packets, and reported: "I can confirm what we thought - pings <= 1492 bytes get a response, pings > 1492 bytes get no response, not even a 'fragmentation required'." The issue that confuses me deeply is that half the modems work, and half don't - yet they have similar internals, and are configured as identically as they can be (given differing user interfaces). They are all setup to use LLC, an 8/35 VPI/VCI and to bridge PPPoE in full-bridge mode. I think that fact rules out my ISP or there being a dodgy router somewhere beyond the ISP, as surely that would affect me regardless of which modem is bridging? I was willing to write off one modem as having mysteriously broken firmware, but it seems unlikely that two modems from different vendors would be broken this way -- and also likely that I'll continue to hit the problem if I buy more modems :( tjc -- Turning and turning in the widening gyre The falcon cannot hear the falconer Things fall apart; the center cannot hold Mere anarchy is loosed upon the world

Hi James, I am explicitly setting mtu and mru to 1492 in the pppd config. I don't have any of the faulty modems connected right now, so can't check until tonight -- but last time I brought this up on list, I checked and said "If I'm reading the tcpdump correctly (see reply to myself with it this morning) then yes, there is a frag-required response." I will re-confirm this tonight. So outbound (masq clients to internet via linux router) appeared to be doing the right thing.
Last time this came up, in 2013, you checked the inbound packets, and reported: "I can confirm what we thought - pings <= 1492 bytes get a response, pings > 1492 bytes get no response, not even a 'fragmentation required'."
Yep. I remember now!
The issue that confuses me deeply is that half the modems work, and half don't - yet they have similar internals, and are configured as identically as they can be (given differing user interfaces). They are all setup to use LLC, an 8/35 VPI/VCI and to bridge PPPoE in full-bridge mode.
I think that fact rules out my ISP or there being a dodgy router somewhere beyond the ISP, as surely that would affect me regardless of which modem is bridging?
I was willing to write off one modem as having mysteriously broken firmware, but it seems unlikely that two modems from different vendors would be broken this way -- and also likely that I'll continue to hit the problem if I buy more modems :(
In the absence of anything else, I'm thinking that your modem is doing deep inspection on the PPPoE packets and is setting the MSS on the encapsulated TCP packets according to the current PPP parameters. In that case the "bad" modems aren't doing this (I would call them "good" modems because I don't want something to screw with my packets, but that's just me :) To find out all you'd have to do is telnet to somewhere where you are running tcpdump and see what is captured depending on what modem you are using. James

On 16 April 2014 14:42, James Harper <james.harper@bendigoit.com.au> wrote:
Hi James, I am explicitly setting mtu and mru to 1492 in the pppd config. I don't have any of the faulty modems connected right now, so can't check until tonight -- but last time I brought this up on list, I checked and said "If I'm reading the tcpdump correctly (see reply to myself with it this morning) then yes, there is a frag-required response." I will re-confirm this tonight. So outbound (masq clients to internet via linux router) appeared to be doing the right thing.
Last time this came up, in 2013, you checked the inbound packets, and reported: "I can confirm what we thought - pings <= 1492 bytes get a response, pings > 1492 bytes get no response, not even a 'fragmentation required'."
Yep. I remember now!
The issue that confuses me deeply is that half the modems work, and half don't - yet they have similar internals, and are configured as identically as they can be (given differing user interfaces). They are all setup to use LLC, an 8/35 VPI/VCI and to bridge PPPoE in full-bridge mode.
I think that fact rules out my ISP or there being a dodgy router somewhere beyond the ISP, as surely that would affect me regardless of which modem is bridging?
I was willing to write off one modem as having mysteriously broken firmware, but it seems unlikely that two modems from different vendors would be broken this way -- and also likely that I'll continue to hit the problem if I buy more modems :(
In the absence of anything else, I'm thinking that your modem is doing deep inspection on the PPPoE packets and is setting the MSS on the encapsulated TCP packets according to the current PPP parameters. In that case the "bad" modems aren't doing this (I would call them "good" modems because I don't want something to screw with my packets, but that's just me :)
I guess that's possible, but it seems so unlikely to me.. These are just consumer-grade ADSL modems, and not even particularly high-end ones at that. Reconstructing the tcp streams inside the pppoe streams inbetween the ISP and the linux server and then putting it all back together sounds beyond their means to me. But yeah.. it's possible.
To find out all you'd have to do is telnet to somewhere where you are running tcpdump and see what is captured depending on what modem you are using.
I'll investigate tonight.

In the absence of anything else, I'm thinking that your modem is doing deep inspection on the PPPoE packets and is setting the MSS on the encapsulated TCP packets according to the current PPP parameters. In that case the "bad" modems aren't doing this (I would call them "good" modems because I don't want something to screw with my packets, but that's just me :)
I guess that's possible, but it seems so unlikely to me.. These are just consumer-grade ADSL modems, and not even particularly high-end ones at that. Reconstructing the tcp streams inside the pppoe streams inbetween the ISP and the linux server and then putting it all back together sounds beyond their means to me.
It's nowhere near that difficult. The PPPoE header is just 8 bytes. Behind that is IP and then TCP. If the TCP header happens to have the SYN flag set then change the MSS as required. (I think you only need to fiddle with the SYN packets but can't remember for sure). It doesn't need to do any sort of connection tracking or reassembly. James

On 16/04/14 12:24, Toby Corkindale wrote:
Hi, I posted last year about a problem I was having with Linux's PPPoE functionality in regards to a specific modem. At the time I put it down to a dodgy modem and moved on, but now I've hit it on another modem, and twice seems more than coincidence.
The problem is that path MTU detection seem to break when the "bad" modems are involved. So the Linux box running pppoe is OK, because it knows the interface has an mtu+mru of 1492, but masqueraded clients do not. You can work around the problem a bit, by having an iptables rule with --clamp-mss-to-pmtu, but it's a kludge.. and importantly, only required for two of these four modems. The other two work just fine *with apparently identical configurations* (ie. LLC / bridged)
Can anyone think of a reason for this?
Wild guess - segmentation offload. I have seen offload do really strange things with masqueraded packets and it is just possible that the bad modems support offload but the good modems do not. On the linux box, issue for interface in eth0 do for option in tso ufo gso gro lro do ethtool $interface $option off done done Replace eth0 with all the physical network interface names (eth0 eth1 etc.).

On 16/04/14 21:07, Keith Owens wrote:
On 16/04/14 12:24, Toby Corkindale wrote:
Hi, I posted last year about a problem I was having with Linux's PPPoE functionality in regards to a specific modem. At the time I put it down to a dodgy modem and moved on, but now I've hit it on another modem, and twice seems more than coincidence.
The problem is that path MTU detection seem to break when the "bad" modems are involved. So the Linux box running pppoe is OK, because it knows the interface has an mtu+mru of 1492, but masqueraded clients do not. You can work around the problem a bit, by having an iptables rule with --clamp-mss-to-pmtu, but it's a kludge.. and importantly, only required for two of these four modems. The other two work just fine *with apparently identical configurations* (ie. LLC / bridged)
Can anyone think of a reason for this?
Wild guess - segmentation offload. I have seen offload do really strange things with masqueraded packets and it is just possible that the bad modems support offload but the good modems do not. On the linux box, issue
for interface in eth0 do for option in tso ufo gso gro lro do ethtool $interface $option off done done
Replace eth0 with all the physical network interface names (eth0 eth1 etc.). _______________________________________________ luv-main mailing list luv-main@luv.asn.au http://lists.luv.asn.au/listinfo/luv-main
<idiot> for interface in eth0 do for option in tso ufo gso gro lro do ethtool -K $interface $option off done done </idiot>

On 16 April 2014 21:07, Keith Owens <kaos@ocs.com.au> wrote:
On 16/04/14 12:24, Toby Corkindale wrote:
Hi, I posted last year about a problem I was having with Linux's PPPoE functionality in regards to a specific modem. At the time I put it down to a dodgy modem and moved on, but now I've hit it on another modem, and twice seems more than coincidence.
The problem is that path MTU detection seem to break when the "bad" modems are involved. So the Linux box running pppoe is OK, because it knows the interface has an mtu+mru of 1492, but masqueraded clients do not. You can work around the problem a bit, by having an iptables rule with --clamp-mss-to-pmtu, but it's a kludge.. and importantly, only required for two of these four modems. The other two work just fine *with apparently identical configurations* (ie. LLC / bridged)
Can anyone think of a reason for this?
Wild guess - segmentation offload. I have seen offload do really strange things with masqueraded packets and it is just possible that the bad modems support offload but the good modems do not. On the linux box, issue
for interface in eth0 do for option in tso ufo gso gro lro do ethtool $interface $option off done done
Replace eth0 with all the physical network interface names (eth0 eth1 etc.).
I don't have root shell access on the modems, so I can't run it there. I experimented with adjusting ethernet options last time this came up in 2013, to no effect. But thanks for the suggestion. Toby

On 17/04/14 11:48, Toby Corkindale wrote:
On 16 April 2014 21:07, Keith Owens <kaos@ocs.com.au> wrote:
On 16/04/14 12:24, Toby Corkindale wrote:
Hi, I posted last year about a problem I was having with Linux's PPPoE functionality in regards to a specific modem. At the time I put it down to a dodgy modem and moved on, but now I've hit it on another modem, and twice seems more than coincidence.
The problem is that path MTU detection seem to break when the "bad" modems are involved. So the Linux box running pppoe is OK, because it knows the interface has an mtu+mru of 1492, but masqueraded clients do not. You can work around the problem a bit, by having an iptables rule with --clamp-mss-to-pmtu, but it's a kludge.. and importantly, only required for two of these four modems. The other two work just fine *with apparently identical configurations* (ie. LLC / bridged)
Can anyone think of a reason for this?
Wild guess - segmentation offload. I have seen offload do really strange things with masqueraded packets and it is just possible that the bad modems support offload but the good modems do not. On the linux box, issue
for interface in eth0 do for option in tso ufo gso gro lro do ethtool $interface $option off done done
Replace eth0 with all the physical network interface names (eth0 eth1 etc.). I don't have root shell access on the modems, so I can't run it there.
I experimented with adjusting ethernet options last time this came up in 2013, to no effect.
But thanks for the suggestion. Toby
Not on the modems, issue the commands on the Linux box. Offload is a negotiated attribute and if one end does not request offload, neither end will do it.

Not on the modems, issue the commands on the Linux box. Offload is a negotiated attribute and if one end does not request offload, neither end will do it.
That is very incorrect. All of the options you listed are internal to the network adapter itself and allow various optimisations in how the OS sends and receives data from the network adapter. None of those options changes the behaviour of the data on the wire, nor are they negotiated between anything else on the network (there are some options that are, like flow control, duplex etc, but these are not those) If something was broken about the way the network adapter or driver implemented those offloads (quite a common thing in the past) then it would be equally broken no matter what modem was used, and would manifest itself in other ways. James

Hi, I posted last year about a problem I was having with Linux's PPPoE functionality in regards to a specific modem. At the time I put it down to a dodgy modem and moved on, but now I've hit it on another modem, and twice seems more than coincidence.
You aren't a Telstra customer are you? http://www.reddit.com/r/australia/comments/2485wh/telstra_internet_traffic_b... I don't know why it would affect only some modem types though. James
participants (3)
-
James Harper
-
Keith Owens
-
Toby Corkindale