"Invalid KEX record length" during SPTPS key regeneration and related issues

Sun May 17 22:16:39 CEST 2015

On Sun, May 17, 2015 at 07:46:45PM +0100, Etienne Dechamps wrote:

> I sent you a pull request that addresses the general issue, at least
> for the short term: https://github.com/gsliepen/tinc/pull/83

Merged.

> > You are right. The main issue with the SPTPS datagram protocol is that
> > it actually doesn't handle any packet loss or reordering during
> > authentication and key regeneration. I will add this, so it will be able
> > to run completely over UDP.
> 
> Well, actually... in the pull request above I "solved" this problem
> simply by doing "if anything weird comes in, just restart the whole
> thing". I wonder if this solution could simply be used as the final
> long term solution. Indeed, SPTPS already knows how to handle
> lost/reordered *data* packets, and handshake packets are only
> exchanged rarely (hourly be default), therefore it seems like
> restarting the whole thing on a bad handshake is not that expensive.

That's true. But I'll try to fix SPTPS anyway.

> In fact, I suspect that with this patch, we might be able to let
> handshake packets go over UDP and it would run just fine, although I
> don't really see why that would be useful in practice.

Currently tinc still relies on UDP packets being properly authenticated,
so it would need some extra work to allow the initial handshake to be
over UDP. It would be nice to have it at some point (after 1.1.0), to
prepare for the possibility of running tinc completely over UDP.

> It's also more reliable since it can potentially address *any* issue
> with the SPTPS tunnel, possibly including ones we don't even know of
> yet. And the code is much simpler than trying to implement full-blown
> packet loss/reorder detection in SPTPS code, I think.

Yes, having the ability to restart after getting stuck is certainly
desirable.

> >> The legacy protocol doesn't have that problem because KEY_CHANGED is a
> >> broadcast message - meaning it can't really get lost.
> >
> > Actually, it can just as well, although it is very unlikely to happen
> > that a broadcast message can get lost, and even less likely that this
> > happens right when a KEY_CHANGED message gets sent.
> 
> That's interesting, can you explain how you see a broadcast message
> getting lost?

When a node has a single metaconnection that has stopped working, but it
hasn't detected that yet, and when before it detects that it has failed,
has made a second metaconnection. Then, as far as the VPN is concerned,
that node has never been unreachable, but a KEY_CHANGED message sent
during that small window will be lost.

-- 
Met vriendelijke groet / with kind regards,
     Guus Sliepen <guus at tinc-vpn.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://www.tinc-vpn.org/pipermail/tinc-devel/attachments/20150517/59cc8a23/attachment.sig>