About peer UDP address detection

Sun Jul 21 19:52:47 CEST 2013

I would like to discuss the following commit:
https://github.com/gsliepen/tinc/commit/4a0b9981513059755b9fd15b38fc198f46a0d6f2 
("Determine peer's reflexive address and port when exchanging keys")

This is a great feature as it basically allows peers to do UDP Hole 
Punching (via MTU probes) even when both are having their source ports 
rewritten by a NAT, which is extremely useful. However, I have a few 
questions and concerns about the way it's currently implemented in 
ANS_KEY messages:

- AFAIK, once a key is negotiated using REQ_KEY and ANS_KEY messages, 
and barring any changes to the graph, it will only be negotiated again 
<KeyExpire> seconds later, which by default is one hour. This means that 
the UDP port information is only updated once an hour, which seems like 
a very long time. There are situations in which the port could suddenly 
change. For example, consider a typical NAT box that will "forget" about 
UDP sessions after some period of inactivity, and will assign a 
different port when tinc starts sending packets again. Or a redundant 
NAT setup that fails over, losing its state in the process. As a general 
note, any solution that relies on NATs not being broken is doomed to 
fail in at least some situations IMHO. My point is, tinc should be able 
to react to any change in network conditions quickly and efficiently.

- Isn't there a race condition in which a node requests a key, gets it, 
but the intermediate nodes didn't have time to determine the UDP address 
yet? When an intermediate node finds out what the UDP address is a few 
seconds later, it is already too late and it won't get any opportunity 
to tell the node about the address for a full hour.

- When a node receives an ANS_KEY message with UDP address information, 
it will use this information to override the node's address in its own 
structures, no questions asked. Shouldn't it be smarter than that? For 
example, if LocalDiscovery yields a local address, the node will use it 
until it gets an ANS_KEY message for the node, in which case it will 
suddenly revert back to non-local communication until LocalDiscovery 
kicks in again. Shouldn't there be some logic in ans_key_h() to prevent 
it from updating the node's UDP address if n->status.udp_confirmed is 
already true?

- If I understand the code right, when local communication (as in 
LocalDiscovery) between two nodes suddenly stop working, they will 
potentially fall back to indirect communication instead of direct 
non-local communication, because they forgot about the address they got 
from ANS_KEY (which was overridden by LocalDiscovery). Granted, that 
would be an extremely weird situation (local communication suddenly 
stops working but non-local UDP hole punching is okay? seems unlikely) 
which we probably don't really care about.

I am wondering if it would be cleaner to store the node's real UDP 
address in edge_t::address, and then if it gets updated, an ADD_EDGE 
message is automatically broadcast to inform the whole graph of the new 
address, updating the edge information in each node. This would solve 
the first two issues by switching from a "pull" model to a "push" model 
for propagating a node's real UDP address. It would also solve the 
fourth issue by storing received UDP address information in two separate 
places (edge_t::address and node_t::address, respectively). In addition, 
this mechanism makes more sense (well, in my mind at least) because 
edge_t::address is already used to determine UDP addresses anyway. 
Furthermore, it would probably work even with old clients because if my 
understanding of the code is correct, they already handle edge updates 
correctly.

The only regression from the current solution that I can see is that 
with the current implementation, intermediate nodes can "participate" in 
determining the node's UDP address by rewriting ANS_KEY messages as they 
pass through them. This is not possible in my proposal because that 
would require a node to modify an edge that it doesn't own. However, I'm 
not sure that it's actually that useful considering the directly 
connected nodes usually have the best information about how to reach 
their neighbor anyway. I can think of one example where that isn't true, 
which would look like this:

   A <--> B <---(NAT)---> C <---> D

In this case, B actually holds the best information about how to join D, 
not C. Assuming I'm understanding tinc's graph traversal correctly, what 
will happen (if my proposal is implemented) if that A will try to reach 
D *via C* using UDP address information provided by B - it won't be able 
to use D's edge information because it only contains local addresses. So 
instead of talking directly to D, A will indirect via C. I don't think 
that's much of a concern IMHO because it is only a problem if the link A 
<---> D is significantly more preferable to A <---> C <---> D, which 
seems unlikely in such a topology, and can be fixed by directly 
connecting D to B.

Thoughts? I get the feeling that Guus already thought about doing it 
this way and that there is some fundamental problem that would break my 
nice proposal :/

-- 
Etienne Dechamps