PMTU Discovery

Fri Jan 7 13:24:51 CET 2011

On Fri, Jan 07, 2011 at 12:02:50PM +0100, Daniel Schall wrote:

> > > Although PMTU Discovery works with compressed packets and some "magic" 
> > > to allow badly compressible packets as well, I think just sending 
> > > uncompressed packets would be a lot easier and more reliable.
> 
> > You would still need to use the "magic" for badly compressible packets
> > even if you send uncompressed MTU probes.
> 
> You do not need that magic if you add a flag to the packet, indicating if
> the packet is compressed or not.

Ah yes, that is true.

> > The problem with bitfields is that it is undefined how they are laid out
> > in memory, so you cannot use them when you send them over the wire.
> 
> Are you sure this is correct?

Yes, the C standard says the compiler can put the bits in whatever order it
likes. Actually the same goes for structs in general, assuming the uint32_t
seqno immediately precedes the uint8_t data[] is already dangerous...

> Please have a look at the definition above. I've introduced a union of a
> struct and a plain uint32_t.  You can set bits by accessing the struct
> "flags". That works for every memory layout correctly, no matter if it's
> little or Big endian.  Prior to sending, I encode the uint32_t (called
> "flag_bits") using "htonl".  That way, the bitfield is converted to an
> integer and properly sent over the wire.  On the receiving side, I decode
> "flag_bits" using "ntohl" again, and the bitfield is set up properly for the
> memory layout on the client side.

Oops, I missed that while reading the diff. So yes, it will probably work
correctly on many processors then.

> > You can not assume anything when a probe is not returned, this could be
> > because the probe was larger than the PMTU, but also because the packet was
> > dropped due to temporary congestion. If you could, then your method would
> > work fine.
> 
> You are right, I did not see that point.  Anyway, using rand() does seem a
> little indeterministic to me, that's why I was trying to introduce a
> predictable mechanism.  I'll revise the code and see, if I can figure out a
> better (deterministic) algorithm to determine the proper probe size.

The random sized probes usually converge very fast. Perhaps a better way is to
combine your approach with randomness: split the range between minmtu and
maxmtu in three pieces, select a random number between 0 and (maxmtu - minmtu)
/ 3, and send three packets with the minimum size of each piece + the random
number. This ensures the three probes always have very different sizes,
allowing a fast binary search things are normal, and can still deal with the
case of firewalls + low MTU in a simple way.

> > That also is unfortunately the wrong thing to do. The PMTU does not need
> > to be the same in both directions.
> 
> You are right about that.
> But how does your code reflect the fact, that the maximum size of the
> received PMTU probes does not necessarily reflect the MTU for sending?
> To take that into account, my idea was to reply every PMTU packet with a
> PMTU response packet, having the same size.

That is exactly what the code does:

> Another question regarding "mtu_probe_h":
> Can you please explain what this code block does?
> 	if(!packet->data[0]) {
> 	    packet->data[0] = 1;
> 	    send_udppacket(n, packet); 
> 	} else {

If it receives a probe packet with the first byte set to 0 (which is what is
generated by send_mtu_probe()), it sets the first byte to 1 and sends the
packet straight back. If the first byte is non-zero, it assumes it is the probe
reply, and changes minmtu accordingly.

> Another possibility would be to inform the node sending the PMTU packet via
> the meta connection about the successful reception of a  probe, so it can
> adjust its MTU.

Yes, but again not in a backwards compatible way.

> Besides that, I experienced a strange issue in the PMTU Discovery process:
> After exchanging the first PMTU probes and fixing the MTU, one node sends a
> single burst (when n->mtuprobe == 32)  to the remote.
> On the remote side, as soon as this packets are received, the value of
> n->mtuprobes is set to 30.

The remote side doesn't change n->mtuprobes when receiving those packets,
because data[0] == 0.

> On the next run of send_mtu_probes, the value is
> set to 31, and nothing else happens; no PMTU packets are sent.
> Each time the remote node receives a PMTU Probe, the value of n->mtuprobes
> is again set to 30, preventing the node from sending more probes back to the
> first node.
> That's why the first node runs into a timeout after "pingtimeout" seconds
> and the PMTU process starts over again.
> Is this behavior a feature or a bug?

I agree it is not so clear, I have used n->mtuprobes both as a counter and as a
way to store the state of the PMTU discovery mechanism. In send_mtu_probe(),
n->mtuprobes == 31 means that PMTU discovery has succeeded, and that we wait
PingInterval seconds (60 seconds by default). In the next run of
send_mtu_probe(), n->mtuprobes == 32, and this causes it to send some probes
and wait PingTimeout seconds (5 by default). Then, in the following run of
send_mtu_probes(), n->mtuprobes is either 33 if mtu_probe_h() did not receive
any response to the previous probes, in which case PMTU discovery is restarted.
If mtu_probe_h() did receive a response, n->mtuprobes is 31, which causes
send_mtu_probe() to wait another PingInterval before sending the next probes.

So, this is a feature that causes tinc to keep sending PMTU probes at a regular
interval to check whether UDP communication with the previously discovered MTU
is still possible.

> > The coding style is OK, the git-email commit is also fine.
> > But you should not combine unrelated changes in one commit.
> > If you accidentily did, you can undo the last commit with "git reset HEAD^"
> > (this undoes the commit but doesn't throw away the changes you made to the files),
> > and then commit the individual bits and pieces again.
> > Also, in most cases you should just remove lines instead of uncommenting
> > them, you can find them back in the git history anyway.
> 
> I know, and I am sorry. I'm still playing around with git and did not figure
> out everything.
> Further, I did not intend my commit to be pushed to your repository, I'd
> rather have it as a discussion base to notify you about my ideas and
> changes.
> When you provide public git branches, I think it'd be easier to discuss
> changes.

No problem!

I have looked around for some simple ways to have people push changes to their
own branches, and it seems gitolite can allow that in a relatively easy way. I
would have to check whether the people that provide hosting for tinc-vpn.org
allow that though, as it involves adding an account where multiple people would
have access to.

Another option is that you make a (bare) clone of your own copy of the tinc git
repository in a publicly accessisble spot (for example, your own website), and
push your changes into that repository.

I could also see if I can create a tinc project on github.

> > > level has now a flag and you can combine log levels you want to log by
> > > ORing the according flags.
> 
> > Hm, I don't see that in the commit, only what you write below:
> > > CTRL+C enables DEBUG_SCARY_THINGS similar to the old behavior.
> 
> Oops, sorry, the changes I mentioned were in a different commit I forgot to
> send you.  Please find it attached.  In this commit, I also changed the log
> time in console output to a human-readable format.

That looks great. However, DEBUG_SCARY_THINGS is not completely the same
anymore. This should become 64, and then I would just add a DEBUG_ALL = ~0.
Apart from that I'll merge it :)

-- 
Met vriendelijke groet / with kind regards,
     Guus Sliepen <guus at tinc-vpn.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://www.tinc-vpn.org/pipermail/tinc-devel/attachments/20110107/fbb1452b/attachment.pgp>