Segfaults on connection loss

Tuomas Silen tuomas at silen.fi
Thu Nov 7 23:36:57 CET 2013


Hi there,

I'm seeing quite frequent segfaults around check_dead_connections() and 
terminate_connection() when the tcp meta connection to a node times out 
(or is e.g. firewalled), usually it happens when there's heavy packet loss:

Program terminated with signal 11, Segmentation fault.
#0  edge_del (e=0x1b71ba0) at edge.c:96
96              avl_delete(e->from->edge_tree, e);
(gdb) bt
#0  edge_del (e=0x1b71ba0) at edge.c:96
#1  0x0000000000408a65 in terminate_connection (report=false, 
c=0x1abca00) at net.c:188
#2  terminate_connection (c=0x1abca00, report=false) at net.c:168
#3  0x0000000000409579 in check_dead_connections () at net.c:263
#4  main_loop () at net.c:444
#5  0x000000000040478a in main (argc=<optimized out>, argv=<optimized 
out>) at tincd.c:656

(gdb) p *e
$1 = {from = 0x3932366432203231, to = 0x61626c6120373462, address = { 
... }, options = 908079418, weight = 1663055157, connection = 
0x3034393120, reverse = 0x0}

(gdb) up
#1  0x0000000000408a65 in terminate_connection (report=false, 
c=0x1abca00) at net.c:188
188                     edge_del(c->edge);

(gdb) p *c
$2 = {name = 0x1abb560 "...", address = { ... }, hostname = 0x1b6dc70 
"... port 655", protocol_version = 17, socket = 15, options = 0, status 
= {pinged = 0, active = 0, connecting = 0, unused_termreq = 0, remove = 
1, timeout = 0, encryptout = 0, decryptin = 0, mst = 0, unused = 0}, 
estimated_weight = 1011, start = {tv_sec = 1383624081, tv_usec = 
156843}, outgoing = 0x1abc4c0, node = 0x1ad8ef0, edge = 0x1b71ba0, 
rsa_key = 0x0, incipher = 0x7ff0d99eea20, outcipher = 0x7ff0d99eea20, 
inctx = 0x0, outctx = 0x0, inkey = 0x0, outkey = 0x0, inkeylength = 0, 
outkeylength = 0, indigest = 0x7ff0d99ef8c0, outdigest = 0x7ff0d99ef8c0, 
inmaclength = 0, outmaclength = 0, incompression = 0, outcompression = 
0, mychallenge = 0x0, hischallenge = 0x0, buffer = "...", buflen = 0, 
reqlen = 0, tcplen = 0, allow_request = 0, outbuf = 0x1ab75d0 "0 ... 
17\n", outbufstart = 0, outbuflen = 0, outbufsize = 14, last_ping_time = 
1383624085, last_flushed_time = 1383624085, config_tree = 0x1abb580}

(gdb) p c->status.remove
$3 = 1
(gdb) p now
$4 = 1383624087
(gdb) p pingtimeout
$5 = 2

It seems as if something else already cleaned up the connection, also 
c->status.remove == 1, but we still got to line 263.


Another:

Program terminated with signal 11, Segmentation fault.
#0  edge_del (e=0x2598c20) at edge.c:93
93                      e->reverse->reverse = NULL;
(gdb) bt
#0  edge_del (e=0x2598c20) at edge.c:93
#1  0x0000000000408a65 in terminate_connection (report=false, 
c=0x258ccf0) at net.c:188
#2  terminate_connection (c=0x258ccf0, report=false) at net.c:168
#3  0x0000000000409579 in check_dead_connections () at net.c:263
#4  main_loop () at net.c:444
#5  0x000000000040478a in main (argc=<optimized out>, argv=<optimized 
out>) at tincd.c:656

(gdb) p *e
$1 = {from = 0x2598c40, to = 0x52779ed8, address = { ... }, options = 
824193328, weight = 960048688, connection = 0x32332f30312e312e, reverse 
= 0x303123}

(gdb) p *e->reverse
Cannot access memory at address 0x303123

(gdb) p c->status
$2 = {pinged = 0, active = 0, connecting = 0, unused_termreq = 0, remove 
= 1, timeout = 0, encryptout = 0, decryptin = 0, mst = 0, unused = 0}

One more:

Program terminated with signal 11, Segmentation fault.
#0  avl_search_closest_node (tree=0x10001, data=0x183f820, 
result=0x7fffd5ba5e9c) at avl_tree.c:346
346             node = tree->root;
(gdb) bt
#0  avl_search_closest_node (tree=0x10001, data=0x183f820, 
result=0x7fffd5ba5e9c) at avl_tree.c:346
#1  0x0000000000404ede in avl_search_node (tree=<optimized out>, 
data=<optimized out>) at avl_tree.c:335
#2  0x0000000000405469 in avl_delete (tree=0x10001, data=<optimized 
out>) at avl_tree.c:645
#3  0x0000000000408a65 in terminate_connection (report=false, 
c=0x1803790) at net.c:188
#4  terminate_connection (c=0x1803790, report=false) at net.c:168
#5  0x0000000000409579 in check_dead_connections () at net.c:263
#6  main_loop () at net.c:444
#7  0x000000000040478a in main (argc=<optimized out>, argv=<optimized 
out>) at tincd.c:656

(gdb) p *tree
Cannot access memory at address 0x10001

(gdb) p c->status
$2 = {pinged = 0, active = 0, connecting = 0, unused_termreq = 0, remove 
= 1, timeout = 0, encryptout = 1, decryptin = 0, mst = 0, unused = 0}


Those are with 1.0.23 but we saw similar with 1.0.21. OS is Ubuntu 12.04.

Any ideas? Let me know if some additional information would be helpful.

Thanks!

-Tuomas Silen


More information about the tinc-devel mailing list