Network pause issue.
Matthew Tolle
matt at night.com
Tue Jan 21 07:07:53 CET 2014
Howdy Folks,
I've got a 5 node setup here. My server "home" is the primary server that all other servers connect to. The configs on all the servers look like this:
# cat /etc/tinc/home/hosts/node1
Subnet = 10.2.0.0/16
Address = 192.168.2.1
<RSA KEY>
# cat /etc/tinc/home/hosts/node2
Subnet = 10.3.0.0/16
Address = 192.168.3.1
<RSA KEY>
Etc. All the hosts are setup the same.
# /sbin/tinc -n home dump subnets
10.1.0.0/16 owner home
10.2.0.0/16 owner node1
10.3.0.0/16 owner node2
10.4.0.0/16 owner node3
10.5.0.0/16 owner node4
# cat /etc/tinc/home/tinc-up
ifconfig $INTERFACE 10.2.0.10 netmask 255.0.0.0
ifconfig $INTERFACE up
# cat tinc.conf
Name = node1
ConnectTo = home
Mode = router
AddressFamily = ipv4
PingInterval = 600
PingTimeout = 15
4 out of 5 nodes work just fine. Node 2 however has issues. It does work fine for 5-30m and then pauses my connection to it. It's still up. I can't ping it over the "pause time" with 0% packet loss. Any TCP connection over the link just pauses for a while. The odd thing is it doesn't timeout. In an SSH session to the box over the tinc link I'll type "ps -ef" and 10m later I'll get the response. SSH should timeout way before then so I'm not sure what's going on. It's not like that all the time. I get maybe 15-30m when it's working just fine and then 10m of network pause. While my SSH session is paused I can see that the app on the server is talking to my primary node over the tunnel. That seems odd.
The app on the node side seems happy and can reach everything it needs to. No sign of issue there. It only seems to be an issue over the tinc tunnel. It kind of feels like maybe something is routing the IP space in a different direction for a period of time and then it comes back. If that were the case my TCP ssh connection would timeout well before the connection returns to life.
Has anyone seen anything like this? I've poked at a bunch of things to try and pinpoint the issue. So far no love.
The routing table looks fine and the same on all of them:
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.3.1 0.0.0.0 UG 0 0 0 eth0
10.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 home
192.168.3.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
Nothing else in node2's area uses 10. space.
Any ideas I would appreciate it.
Thanks,
-Matt
More information about the tinc
mailing list