MTU probes fail on reconnect

Sat Jan 1 20:47:25 CET 2011

On Fri, Dec 31, 2010 at 05:11:29PM -0500, Donald Pearson wrote:

> I've noticed some inconsistent performance with some of my tunnels and
> thought I would take some of the spare free time I have over the holidays to
> try to figure out what the cause of that may be.  My environment in this
> case is my home LAN.
[...]
> What I've discovered using level 5 debugging is that often when a connection
> is made, MTU probes from the client are not responded to.
> 
> The tell-tail sign I've seen every time is particularly high latency.

If MTU probes are not responded to, then yes, tinc will fall back to TCP and
this will increase latency and decrease throughput. It would be interesting to
know why the MTU probes or the responses to them are not received. Perhaps you
can store tinc's debug output (using the --logfile option) and at the same time
run tcpdump on the public network interface to see what is being sent and
received?

> I've been able to reproduce the condition not every, but nearly every time,
> if I manually start the client (windows xp client) in a command prompt.
> Press Ctrl+c to stop the client, and then restart it after approximately 5
> seconds.
> 
> The client will print the message "No response to MTU probes from Server"
> 
> And then basically all traffic from then point on carries the message
> "Packet for Server (10.10.10.1 port 8002) larger than minimum MTU,
> forwarding via TCP"
> 
> From what I can tell, no further attempts at MTU probes are made, and the
> connection will remain a TCP connection unless it is broken and restarted.

MTU probes will be attempted again after one hour by default. It is tied to the
session key timeout, so you can let tinc try more often by adding KeyExpire =
600 to tinc.conf for example. Still, this is suboptimal of course. I will
change it to keep sending MTU probes every PingInterval, just like it does when
MTU probes did not fail.

> Usually if I stop the client and wait for about 30 seconds to reconnect,
> there is a much greater chance that the MTU probes work fine, and in about
> 30 seconds MTU is fixed to 1416.

Hm, that might indicate a bug in tinc. However, I could not reproduce it with
two Linux machines. It could also be a problem with some stateful firewall rule
or a router doing NAT that keeps an old mapping around for 30 seconds.

> Every time when the MTU probing fails, I see latency between 700 - 1000 ms
> with 32 byte pings over a LAN.

That in itself is way too high, but this is a problem many people have seen on
Windows.

> So as observed, when Tinc defaults to TCP due to MTU probes failing there is
> a significant reduction in throughput and latency.  When this happens in the
> "real world" it's begun to become a little annoying to break and re-connect
> several times until I get a good connection.

Indeed.

> I haven't yet been able to figure out why the MTU probes only work some of
> the time.  I thought it might be my WAN but now I know it's not.  There is
> nothing in the path of this test to block them, and I haven't found any sign
> of packet loss on the LAN.

Ok.

> I will do some testing later of setting PMTU directly and disabling
> PMTUDiscovery to see if that will result in consistent behavior.  I would
> really rather leave dynamic PMTU though because it's a really nice feature
> when you connect with a laptop and you never know what network medium you'll
> be on.

Yes, the idea is that tinc should figure out the best way to connect to other
nodes on its own of course. I'll try to reproduce the problem with a node
running Windows, maybe that makes a difference.

-- 
Met vriendelijke groet / with kind regards,
     Guus Sliepen <guus at tinc-vpn.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://www.tinc-vpn.org/pipermail/tinc/attachments/20110101/0f320df9/attachment.pgp>