From 613d586afd22159cee57c9524218c7200f4f1096 Mon Sep 17 00:00:00 2001 From: Etienne Dechamps Date: Sun, 22 Nov 2015 17:14:14 +0000 Subject: [PATCH] Don't unset validkey when receiving SPTPS handshakes over ANS_KEY. This fixes a hairy race condition that was introduced in 1e89a63f1638e43dee79afbb18d5f733b27d830b, which changed the underlying transport of handshake packets from REQ_KEY to ANS_KEY. Unfortunately, what I missed in that commit is, on the receiving side, there is a slight difference between req_key_h() and ans_key_h(): indeed, the latter resets validkey to false. The reason why this is not a problem during typical operation is because the normal SPTPS key regeneration procedure looks like this: KEX -> <- KEX SIG -> <- SIG All these messages are sent over ANS_KEY, therefore the receiving side will unset validkey. However, that's typically not a problem in practice because upon reception of the last message (SIG), SPTPS will call sptps_receive_record(), which will set validkey to true again, and everything works out fine in the end. However, that was the *typical* scenario. Now let's assume that the SPTPS channel is in active use at the same time key regeneration happens. Specifically, let's assume a normal VPN data packet sneaks in during the key regeneration procedure: KEX -> <- KEX <- (SPTPS packet, over TCP or UDP) <- KEX (wtf?) SIG -> (refused with Invalid packet seqno: XXX != 0) At this point, both nodes are extremely confused and the SPTPS channel becomes unusable with various errors being thrown on both sides. The channel will stay down until automatic SPTPS channel restart kicks in after 10 seconds. (Note: the above is just an example - the race can occur on either side whenever a packet is sent during the period of time between KEX and SIG messages are received by the node sending the packet.) I've seen this race occur in the wild - it is very likely to occur if key regeneration occurs on a heavily loaded channel. It can be reproduced fairly easily by setting KeyExpire to a short value (a few seconds) and then running something like ping -f foobar -i 0.01. The reason why this occurs is because tinc's TX code path triggers the following: - send_packet() - try_tx() - try_tx_sptps() - validkey is false because we just received an ANS_KEY message - waitingforkey is false because it's not used for key regeneration - send_req_key() - SPTPS channel restart (sptps_stop(), sptps_start()). Obviously, it all goes downhill from there and the two nodes get very confused quickly (for example the seqno gets reset, hence the error messages). This commit fixes the issue by keeping validkey set when SPTPS data is received over ANS_KEY messages. --- src/protocol_key.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/protocol_key.c b/src/protocol_key.c index b409a357..dd547047 100644 --- a/src/protocol_key.c +++ b/src/protocol_key.c @@ -418,7 +418,7 @@ bool ans_key_h(connection_t *c, const char *request) { cipher_close(from->outcipher); digest_close(from->outdigest); #endif - from->status.validkey = false; + if (!from->status.sptps) from->status.validkey = false; if(compression < 0 || compression > 11) { logger(DEBUG_ALWAYS, LOG_ERR, "Node %s (%s) uses bogus compression level!", from->name, from->hostname); -- 2.20.1