Self-DoS

Pierre Beck pbeck at videobuster.de
Wed Dec 30 18:26:38 CET 2015


Hi,

I have successfully connected a network of about 60 nodes (many of which are virtual machines) with tinc 1.0 but encounter a severe bug when physical connectivity between two major locations is lost and then reconnected. From what I gathered, many nodes attempt to connect to many other nodes, causing 100% CPU load on all nodes, taking down the whole network with no node succeeding connecting to any node. It seems unable to recover from this state. Luckily I can shutdown and restart most daemons with a few keystrokes, but I have to shutdown all, then start them sequentially and delayed or this "perfect storm" starts all over again.

The overall configuration is switch mode, with mixed IPv4 and IPv6 host addressing. Otherwise config is empty with these tweaks added to attempt mitigating the issue (with no success):

PingTimeout=15
UDPRcvBuf=8388608
UDPSndBuf=8388608
ProcessPriority=high

The daemon was upgraded to vanilla built 1.0.26 on all but two nodes before the most recent event. Host OS is Debian based, ranging from Squeeze to Jessie and few Ubuntu Trusty, with their respective stock kernels.

Also, I have tried firewalling the incoming UDP traffic on most nodes, forcing TCP for those connections, to narrow down the problem, but it doesn't seem to change anything.

At event time, the logs have these:
tincd[1093]: Flushing meta data to server1084 (x.x.x.x port y) failed: Connection reset by peer
tincd[1093]: Flushing meta data to server1070 (x.x.x.x port y) failed: Connection reset by peer
tincd[1093]: Flushing meta data to server1052 (x.x.x.x port y) failed: Connection reset by peer
tincd[1093]: Flushing meta data to server1071 (x.x.x.x port y) failed: Connection reset by peer

And these:
tincd[1093]: Metadata socket read error for server1076 (x:x:x:x:x:x:x:x port y): Connection reset by peer
tincd[1093]: Metadata socket read error for <unknown> (x.x.x.x port y): Connection reset by peer

And occasional:
tincd[8520]: Old connection_t for server1039 (x.x.x.x port y) status 0010 still lingering, deleting...

Any ideas?

Regards,

Pierre Beck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tinc-vpn.org/pipermail/tinc/attachments/20151230/44ed233c/attachment.html>


More information about the tinc mailing list