Some questions about SPTPS

Sun Jul 27 15:58:57 CEST 2014

On Sun, Jul 27, 2014 at 11:34:23AM +0100, Etienne Dechamps wrote:

> Looking into this some more, I realized it's even worse than I thought: not
> only is TCP packet transport inefficient, it's also the only way to indirect
> a packet through a relay. This is because, as far as I can tell, there is no
> way to relay UDP SPTPS packets, since there's no way to know what their
> recipient is without decrypting them, but only the recipient has the key.
> 
> This is a significant regression when compared to the legacy protocol: tinc
> used to try (1) direct UDP communication, (2) indirect UDP communication,
> (3) indirect TCP communication, thus always choosing the most efficient path
> to the destination. With SPTPS, (2) is impossible and tinc falls back
> directly to (3).

There's two cases:

1. Two 1.1 nodes with a 1.0 node in the middle. If traffic needs to be
sent indirectly, then the only option while keeping end-to-end security
is to do this via the inefficient REQ_KEY messages. There is no hope of
doing this via UDP.

2. All nodes are running 1.1. Indeed here the protocol should be changed
before 1.1.0 is released to handle indirect messages efficiently.

> I have a rough proposal that will hopefully bring the best of both worlds.
> Basically, when a node wants to send an SPTPS record through a relay via
> UDP, it would add a header to the UDP packet containing the first 4 bytes of
> the SHA-256 hash of the recipient node name and the source node name (total
> 8 bytes). The rest of the packet would be the original SPTPS datagram,
> encrypted as usual with the end-to-end key. The relay node would then use
> the hash to figure out what the final recipient is, and then blindly relay
> the packet to that recipient (or to the next relay). If multiple nodes have
> the same hash (which, according to the birthday problem, has a 0.01% chance
> of happening with 1000 nodes), then UDP relaying is considered impossible
> and communication would fall back to TCP.

Hashes are indeed an option, but they can fail. And if they fail, the
only option would be to rename a node, which is very inconvenient.

Two other options:

- Rely on the fact that all nodes have exactly the same list of edges
  and subnets, and that those are ordered. So one can number them and
  use those numbers instead of hashes. That way, there normally are no
  conflicts, only for brief moments when add/del messages are being
  sent.

- Use a form of MPLS: when node A wants to relay messages to C via B, it
  asks B to give it a tag that is unique for B that tells B the packets
  are from A and to be forwarded to C. Of course, then B itself should
  get a tag from C (or if B cannot reach C directly either, from another
  intermediate node). The drawback of this scheme is that it requires
  additional requests and more data structures to maintain.

> Now this is where it gets tricky: there is no simple way on the relay side
> to differentiate a "normal" incoming packet (where the first 4 bytes is the
> seqno) and a "relay" incoming packet (where the first 4 bytes is the
> recipient hash). I have a solution in mind, which is kinda ugly but should
> work just fine: try to decrypt the packet as usual, if it fails, and the
> first 8 bytes are two known hashes, then relay. This is not as inefficient
> as it sounds: we can make the decryption fail-fast in the vast majority of
> cases by checking the seqno first.

One other option is to do hop-by-hop and end-to-end encryption in this
case, with the hop-by-hop encrypted packet getting a different SPTPS
record type than the normal packets, to signal the intermediate nodes
that they have to forward it and that it has an 8 byte header to tell
the source and destination. Apart from reducing the path MTU, this is
only less efficient for the source and final destination, for the
intermediate nodes it's the same amount of work as with the legacy
protocol. A benefit would be that there is no way for MITMs or other
attackers to make a node relay fake or duplicate packets.

> One thing to note though: for this to work the relay node needs to know the
> actual UDP address of the source node, otherwise it would try to get it from
> trying node keys on the packet, but the packet is not encrypted using any of
> those keys, so that wouldn't work. The obvious solution is to make the
> source node and the relay probe MTU between each other before attempting to
> relay, which will provide reliable UDP address information (as well as UDP
> hole punching, etc.) and is required anyway to ensure the UDP packets will
> get through. In that sense the behavior is identical to the legacy protocol.

Indeed.

> When the final relay sends the packet over UDP to the final recipient, it
> will preserve the relay header when sending the packet. This is because the
> recipient needs the source node hash in order to know which key to use
> (otherwise it would use the relay node's key, and fail).

Correct.

> One could suggest that it would be useful to have this relay header for
> every packet (including direct ones), since that would optimize the unknown
> UDP source address code path (try_harder()) by using the source information
> instead of trying all keys. IMHO I don't think it's worth the overhead,
> especially since it would affect the case that's supposed to be the most
> efficient (direct UDP communication).

One possibility would be to reserve one value of the seqno field in the
UDP SPTPS packets (say, 0xffffffff) for signalling that this is not a
regular packet. That means the header for to-be-forwarded packets
becomes 12 bytes, but that's a small price to pay.

> Of course, MTU calculations would have to take the relay header overhead (8
> bytes) into account when sending a data packet through a relay.

Yes.

> In terms of security, while this doesn't allow an attacker to break the
> secrecy of communications (since end-to-end encryption is still used), it
> does allow an attacker to trick nodes into relaying bogus packets, assuming
> it can impersonate a node's UDP address. Therefore the worst that can be
> done would be to perhaps increase the load on the network somewhat (no
> potential for amplification). Considering this requires a MITM, I don't
> think this is important enough for us to care. In addition, one can deduce
> what the source and destination nodes are just by looking at a relay packet,
> but that's no worse than direct communication where this information is
> given away by the source and destination IP addresses anyway.
> 
> Thoughts?

Everything you wrote makes sense. I had already thought about this issue
myself a bit, and had similar ideas. If you want to try to implement
this, go ahead!

-- 
Met vriendelijke groet / with kind regards,
     Guus Sliepen <guus at tinc-vpn.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://www.tinc-vpn.org/pipermail/tinc/attachments/20140727/b8215fe6/attachment.sig>