Mode: switch and DHCP problems on network with many nodes

Anton Avramov lukav at lukav.com
Fri Feb 8 17:11:18 CET 2019


Hi All,

I currently have the following setup.
One central node called BackBone with the following conf:

Name = Backbone
Mode = switch
AddressFamily = ipv4
ReplayWindow=64
Compression=10

I also have approximately 440 nodes connected to this node with the 
following setup:
Name = xxxxxx
Mode = switch
ConnectTo = Backbone
Compression = 10

There is dnsmasq on Backbone that serves ips to the nodes based on their 
dhcp-client-identifier which is unique for each node.

The setup have worked perfect for years now with multiple versions of 
debian. Currently all nodes are on debian 8 stretch and use the tinc 
package from the official repository.

Now my problem is that BackBone is the single point of failure, and I 
want to add backup solutions to it.
I've created 2 more "central servers" with the conf:
Name = Server1
Mode = switch
Interface = support
ConnectTo = Backbone
ConnectTo = Server2
Compression = 10
ReplayWindow=64

The servers synchronize the hosts directory with all the nodes keys 
between each other.

With this setup If I set some of the nodes to connect only to Server1 
for example, it works. It gets IP and everything is fine after I open 
port TCP/655 on Server1.

However a few minutes after opening port TCP/655 on Server1 all goes to 
hell.

The tinc process on the BackBone and Server1 gets to 100% CPU 
utilization. The nodes stop renewing their IP addresses and their 
dhclient are constantly trying to get a new address.
Even if I fix the ip address there is no ping to the server.

Using dhcpdump I see that there is dhcp traffic to backbone but I 
suspect there is some loop going on that is causing tinc daemon to choke.

I understand that my setup is somewhat unusual and if you have nay 
questions I'll be glad to provide more information.

Can someone suggest what the problem is and how to overcome it?
Any help will be greatly appreciated :)

P.P. I've tried to recreate the setup on a testing environment with 
nspawn containers, but unfortunately the problems doesn't manifests there.

Best regards




More information about the tinc mailing list