Segfaults on connection loss

zorun zorun at polyno.me
Wed Jun 25 08:40:08 CEST 2014


I have been able to trigger this segfault reliably, with Tinc 1.0.24:

0/ setup your local Tinc node to connect to a remote dual-stacked Tinc
   node (that is, the remote node has both a A and a AAAA record)

1/ run Tinc in debug mode, "tincd -n NETNAME -D -d 3"

2/ wait until Tinc establishes a connection with the remote server
   (lots of "Got PACKET from REMOTE (XX.XX.XX.XX port 656)" messages)

3/ apply an insane amount of delay on your physical interface: "tc
   qdisc del dev eth0 root netem delay 10s"

4/ wait for Tinc to timeout on a ping.  It will then try to reconnect,
   and crash:

Got PACKET from REMOTE (XX.XX.XX.XX port 656)
Sending PING to REMOTE (XX.XX.XX.XX port 656)
Got PACKET from REMOTE (XX.XX.XX.XX port 656)
Got PING from REMOTE (XX.XX.XX.XX port 656)
Sending PONG to REMOTE (XX.XX.XX.XX port 656)
Got PACKET from REMOTE (XX.XX.XX.XX port 656)
REMOTE (XX.XX.XX.XX port 656) didn't respond to PING in 5 seconds
Closing connection with REMOTE (XX.XX.XX.XX port 656)
Sending DEL_EDGE to everyone (BROADCAST)
UDP address of REMOTE cleared
UDP address of OTHER_SERVER1 cleared
UDP address of OTHER_SERVER2 cleared
UDP address of OTHER_SERVER3 cleared
UDP address of OTHER_SERVER4 cleared
Sending DEL_EDGE to everyone (BROADCAST)
Trying to connect to REMOTE (2001:db8::1 port 656)
Connected to REMOTE (2001:db8::1 port 656)
Sending ID to REMOTE (2001:db8::1 port 656)
Timeout from REMOTE (2001:db8::1 port 656) during authentication
Closing connection with REMOTE (2001:db8::1 port 656)
Segmentation fault (core dumped)


(I've replaced the IPv4 address of the remote server with
"XX.XX.XX.XX", and the IPv6 address by "2001:db8::1")

Note that in my setup, there are other Tinc nodes
(OTHER_SERVER{1,2,3,4}), but the local node cannot reach them because
of a firewall.


When doing the same experiment, but only having a A record for the
remote node, Tinc does not crash:

Got PACKET from REMOTE (XX.XX.XX.XX port 656)
Got PACKET from REMOTE (XX.XX.XX.XX port 656)
Sending PING to REMOTE (XX.XX.XX.XX port 656)
Got PACKET from REMOTE (XX.XX.XX.XX port 656)
Got PING from REMOTE (XX.XX.XX.XX port 656)
Sending PONG to REMOTE (XX.XX.XX.XX port 656)
REMOTE (XX.XX.XX.XX port 656) didn't respond to PING in 5 seconds
Closing connection with REMOTE (XX.XX.XX.XX port 656)
Sending DEL_EDGE to everyone (BROADCAST)
UDP address of REMOTE cleared
UDP address of OTHER_SERVER1 cleared
UDP address of OTHER_SERVER2 cleared
UDP address of OTHER_SERVER3 cleared
UDP address of OTHER_SERVER4 cleared
Sending DEL_EDGE to everyone (BROADCAST)
Could not set up a meta connection to REMOTE
Trying to re-establish outgoing connection in 10 seconds
Purging unreachable nodes
Sending DEL_SUBNET to everyone (BROADCAST)
Sending DEL_SUBNET to everyone (BROADCAST)
Sending DEL_EDGE to everyone (BROADCAST)
Sending DEL_EDGE to everyone (BROADCAST)
Sending DEL_EDGE to everyone (BROADCAST)
Sending DEL_EDGE to everyone (BROADCAST)
Sending DEL_SUBNET to everyone (BROADCAST)
Sending DEL_EDGE to everyone (BROADCAST)
Sending DEL_EDGE to everyone (BROADCAST)
Sending DEL_EDGE to everyone (BROADCAST)
Sending DEL_EDGE to everyone (BROADCAST)
Sending DEL_SUBNET to everyone (BROADCAST)
Sending DEL_EDGE to everyone (BROADCAST)
Sending DEL_EDGE to everyone (BROADCAST)
Sending DEL_EDGE to everyone (BROADCAST)
Sending DEL_SUBNET to everyone (BROADCAST)
Sending DEL_EDGE to everyone (BROADCAST)
Sending DEL_EDGE to everyone (BROADCAST)
Sending DEL_EDGE to everyone (BROADCAST)
Sending DEL_SUBNET to everyone (BROADCAST)
Sending DEL_EDGE to everyone (BROADCAST)
Sending DEL_EDGE to everyone (BROADCAST)
Trying to connect to REMOTE (XX.XX.XX.XX port 656)
Connected to REMOTE (XX.XX.XX.XX port 656)
Sending ID to REMOTE (XX.XX.XX.XX port 656)
Timeout from REMOTE (XX.XX.XX.XX port 656) during authentication
Closing connection with REMOTE (XX.XX.XX.XX port 656)
Could not set up a meta connection to REMOTE
Trying to re-establish outgoing connection in 15 seconds
Purging unreachable nodes


Interestingly, in this case, Tinc purges unreachable nodes before
attempting to reconnect, while it does not do so in the dual-stack
case.  Could that be the issue?

Thanks,
zorun

On Tue, Jun 17, 2014 at 03:46:53AM +0200, zorun wrote:
> On Mon, Jun 16, 2014 at 09:57:23AM +0200, zorun wrote:
> > Hi,
> > 
> > I got a new Tinc segfault, again in a condition of bad network connectivity:
> > 
> >   Old connection_t for mejis (XX.XX.XX.XX port 656) status 0010 still lingering, deleting...
> >   Segmentation fault (core dumped)
> > 
> > 
> > This is tinc version 1.0.24 on Archlinux x86_64:
> > 
> >   # tincd --version
> >   tinc version 1.0.24 (built May 12 2014 09:24:12, protocol 17)
> > 
> > 
> > I have a core dump, but I doubt it's very useful, as it doesn't have
> > any debug symbols.
> 
> Now with debug symbols, and a new segfault.  This (segfaulting) side
> is running 1.0.24, while the other side is running 1.0.19 (Debian wheezy).
> 
> Backtrace follows:
> 
> 
> Core was generated by `tincd -n babel -D'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x00007feaf43945eb in edge_del (e=0x7feaf4c45590) at edge.c:96
> 96              avl_delete(e->from->edge_tree, e);
> (gdb) bt
> #0  0x00007feaf43945eb in edge_del (e=0x7feaf4c45590) at edge.c:96
> #1  0x00007feaf4397b74 in terminate_connection (c=0x7feaf4c42c40, report=false) at net.c:202
> #2  0x00007feaf4397e56 in check_dead_connections () at net.c:277
> #3  0x00007feaf4398623 in main_loop () at net.c:458
> #4  0x00007feaf43ad1bf in main (argc=4, argv=0x7fff4e556f88) at tincd.c:679
> (gdb) bt full
> #0  0x00007feaf43945eb in edge_del (e=0x7feaf4c45590) at edge.c:96
> No locals.
> #1  0x00007feaf4397b74 in terminate_connection (c=0x7feaf4c42c40, report=false) at net.c:202
> No locals.
> #2  0x00007feaf4397e56 in check_dead_connections () at net.c:277
>         node = 0x7feaf4c41e50
>         next = 0x0
>         c = 0x7feaf4c42c40
> #3  0x00007feaf4398623 in main_loop () at net.c:458
>         readset = {fds_bits = {8, 0 <repeats 15 times>}}
>         writeset = {fds_bits = {0 <repeats 16 times>}}
>         tv = {tv_sec = 5, tv_nsec = 0}
>         omask = {__val = {0, 18446744035054845952, 140734507609248, 140647096459960, 0, 140734507609284, 140734507609376, 140647105536336, 18446744035054845952, 
>             140647105560720, 140734507609296, 140647096460210, 140734507609376, 140647105536336, 18446744069414584320, 140647105560720}}
>         block_mask = {__val = {8193, 0 <repeats 15 times>}}
>         next_event = 1402914030
>         r = 1
>         maxfd = 8
>         last_ping_check = 1402914025
>         last_config_check = 1402907973
>         last_graph_dump = 1402913980
>         event = 0x0
> #4  0x00007feaf43ad1bf in main (argc=4, argv=0x7fff4e556f88) at tincd.c:679
>         priority = 0x0
> 
> 
> 
> 
> Let me know if you need more details, or the core file itself.
> 
> Thanks,
> zorun
> _______________________________________________
> tinc-devel mailing list
> tinc-devel at tinc-vpn.org
> http://www.tinc-vpn.org/cgi-bin/mailman/listinfo/tinc-devel


More information about the tinc-devel mailing list