Packet reordering problem?

Tue May 19 14:11:31 CEST 2015

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/18/2015 01:09 PM, Guus Sliepen wrote:
> On Mon, May 18, 2015 at 12:08:53PM +0200, Armin Schindler wrote:
> 
>> We didn't change that [ReplayWindow] setting, so the default is 16. 
>> What exactly will happen if tinc gets a packet which should have
>> arrived 20 packtes before (because of the TOS prio queues)?
> 
> With the default setting of 16, up to 128 packets can be arbitrarily
> reordered without problems. If a packet arrives that is 129 packets late,
> then it will be dropped. Tinc will also log a warning whenever it
> encounters a late packet.

When the error occurs, the other sides get a lot of these messages:
 "Flushing xxx bytes to A (yyy.yyy.yyy.yyy port zzz) would block"
where xxx is increasing with every message.

>> Since I activated the prio queues (VoIP packets get highest prio on eth
>> and tinc device via TOS value), we encountered instability: We have 6
>> nodes connected via tinc. If one node becomes unreachable (internet
>> connection down), normal encrypted packets from/to other nodes seem to
>> be hanging. Just the high prio VoIP packets seem to be forwarded by
>> tinc like normal. Example of nodes A,B,C,D,E and F: A becomes
>> unreachable and suddenly connections (except high prio queue packets)
>> from B to C, and B to D and C to D, etc. have massive packet loss. As
>> soon as A is back online, all goes normal.
> 
> Hm. It would be nice to know how your nodes connect to each other. And

A,B,C and D connects to E and F. E connects to F.

> what do you mean with massive packet loss? 100%?

Not sure. The internet connection of A was down, so must be 100%
(I was not able to check that last time it occured)

>> Could there be an internal queue togehther with the reordering be the
>> cause?
> 
> Tinc itself doesn't queue packets. But your prio queues do by definition.
> So how large are those?

It is the default Linux prio qdisc with default queue depth of the network
interfaces. That qdisc uses 3 bands (3 prios). Set up with e.g.:
 tc qdisc add dev $INTERFACE root handle 1: prio bands 3 priomap 1 2 2 2 0 2
0 0 1 1 1 1 1 1 1 1

> Also, do you have a lot of VoIP traffic? If it's more than the available
> bandwidth, then the prio queues could indeed cause all other traffic to
> be dropped.

VoIP is just one or two phone calls, far away from maximum bandwidth.

Armin
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBAgAGBQJVWyhzAAoJEDzoEIEiWMkU9YAP/A7MVe+7SyXVN5TrFVhv9DKY
A4VAnoSzIXLfM+NtqPpUWBI4LXpgrXjl1vF+RHUvCvzcCSE/WsJSao4oC11vPcj+
rOYSUnt8xz0ifyDrC3w4GmOdub30R6NFDAVrllpwWsyVwFNFU+36eaPcQGO/WIx5
iChWLmAdCSu5xgbVs+D7GG6DeR28F9hGTvuHUBxU58ExiaCbl3J3XcRwPIaRFlXS
8DSc57ZQGfBhuTEYDRTtw3DtRP7YFditMlPWaSTDZlrmNWoOhVdrqkX+miQIMl6w
lzOVeCyLEn3XKbrXFTyq+mckrvM6fl2xCFhxYA6bdZfOmDGJzouOvjwGwrku0/CQ
lrwB/JKTs3Ax5ivmwgzJc268eNzKamksNFJ5r0qWKATWVKMDgCLBc160xJEgqPu1
RYPaMhdxTin70aS4ucJFBhQPhpeRltP0kkUoC3orbV+Oxj5HALYot//KW+nnrOjr
LL7aQY21UZ6aIH12lkRyLslvn/fNBXgaaDQAG97Vrz4WhVF40Q5i75Cz1pkm4GIS
nnsPZxGQFd/IbrOd0B/BDeSZ3d+FnmWlHCaSi59uzQWlPz9lrP+XetuxUIpbGvCl
gJpBZ+VPsCl8N0OjkXRjiRzQwWnyBGJnjGT8TPrPmFvdNDgjkDiIPxnWbfg6MPCY
k8GbUaB9owHINkSYOuTL
=B5TF
-----END PGP SIGNATURE-----