PMTU Discovery

Daniel Schall Daniel-Schall at web.de
Fri Jan 7 12:02:50 CET 2011


> On Thursday, January 06, 2011 11:35 PM, Guus Sliepen wrote:

> > > > I suggest to implement a check, if the compressed data really is 
> > > > smaller than the uncompressed one and decide, what to send over the
wire.
> > 
> > > Although that could be done I don't think it is worth the effort for 
> > > the few packets that might benefit from it.
> > 
> > I don't get, why you do this the hard way with so much uncertainties.  
> > PMTU Discovery could be done easier by sending data packets of 
> > arbitrary size WITHOUT compressing them.  That would enable the nodes 
> > to check the real data size that fits through the channel.
>
> It doesn't matter, the code checks the size of the payload before
> decompressing it, so it sees exactly how many bytes you can stuff in the
> actual UDP packets, whether those bytes contain compressed data or not.
> For tinc 1.0.x, the probes need to use compression to stay backwards
compatible.

Okay, if you'd like to keep that for backward compatibility, fine.

> > Although PMTU Discovery works with compressed packets and some "magic" 
> > to allow badly compressible packets as well, I think just sending 
> > uncompressed packets would be a lot easier and more reliable.

> You would still need to use the "magic" for badly compressible packets
even if you send uncompressed MTU probes.

You do not need that magic if you add a flag to the packet, indicating if
the packet is compressed or not.
Please have a look at my code. If the resulting packet after
"compress_packet" is not smaller than the uncompressed packet,
the uncompressed one is sent. Therefore, the packet sent can never be bigger
than the original one.
Since the size of the original packet is checked to fit into the MTU, the
(un)compressed packet will fit as well.

> > +	union {
> > +		struct {
> > +			uint32_t local:1;		/* is this packet
sent to a local interface? */
> > +			uint32_t pmtud:1;		/* is this packet
used for PMTU discovery? */
> > +			uint32_t pmtur:1;		/* is this packet
used for PMTU reply? */
> > +			uint32_t compress:1;	/* is this packet
compressed? */
> > +			uint32_t unused:28;		/* unused bits */
> > +		} flags;
> > +		uint32_t flag_bits;			/* to easily zero
out and access flag bits */
> > +	};

> The problem with bitfields is that it is undefined how they are laid out
in memory, so you cannot use them when you send them over the wire.

Are you sure this is correct?
See comment below the next paragraph.

> If you add a field to a protocol, it is best to use an uint32_t or similar
type and #define bit masks to set/reset individual bits by name. Even then
you need to convert the new field to network byte order with htonl() before
sending the packet, and do the reverse when > receiving it.

Please have a look at the definition above. I've introduced a union of a
struct and a plain uint32_t.
You can set bits by accessing the struct "flags". That works for every
memory layout correctly, no matter if it's little or Big endian.
Prior to sending, I encode the uint32_t (called "flag_bits") using "htonl".
That way, the bitfield is converted to an integer and properly sent over the
wire.
On the receiving side, I decode "flag_bits" using "ntohl" again, and the
bitfield is set up properly for the memory layout on the client side.

> > Second, the method "send_mtu_probe" now sends 3 packets between minmtu 
> > and maxmtu, with compression disabled. ("send_udppacket" takes this 
> > into
> > account.)
> [...]
> > -	for(i = 0; i < 3; i++) {
> > +	const int nbPackets = 3;
> > +	for(i = 0; i < nbPackets; i++) {
> >  		if(n->maxmtu <= n->minmtu)
> >  			len = n->maxmtu;
> >  		else
> > -			len = n->minmtu + 1 + rand() % (n->maxmtu -
n->minmtu);
> > +			len = n->minmtu + 1 + (n->maxmtu + 1 - n->minmtu) /
(double) 
> > +nbPackets * i;

> There is a reason I used rand(): suppose minmtu = 0 and maxmtu = 1514,
> some firewall is blocking ICMP packets, and the actual path MTU is 505.
> Your scheme sends packets with sizes 1, 506 and 1011.
> The last two packets are dropped, but since tinc doesn't know about it it
doesn't
> lower maxmtu. Minmtu is now set to 1, and the next round of packets have
sizes 2,
> 506, 1011. You can calculate yourself where this is going:
> tinc will stop after 30 probes and think the PMTU is 30, which is not
correct.
>
> You can not assume anything when a probe is not returned, this could be
because
> the probe was larger than the PMTU, but also because the packet was
dropped
> due to temporary congestion. If you could, then your method would work
fine.

You are right, I did not see that point.
Anyway, using rand() does seem a little indeterministic to me, that's why I
was trying to introduce a predictable mechanism.
I'll revise the code and see, if I can figure out a better (deterministic)
algorithm to determine the proper probe size.

> > The method "mtu_probe_h" receives the MTU probes and adjusts the 
> > min/max mtu of the nodes accordingly. Afterwards, it replies the MTU 
> > packet to the sender. That enables the sender to get the MTU correctly
as well.

> That also is unfortunately the wrong thing to do. The PMTU does not need
to be the same in both directions.

You are right about that.
But how does your code reflect the fact, that the maximum size of the
received PMTU probes does not necessarily reflect the MTU for sending?
To take that into account, my idea was to reply every PMTU packet with a
PMTU response packet, having the same size.
Another possibility would be to inform the node sending the PMTU packet via
the meta connection about the successful reception of a  probe, so it can
adjust its MTU.

Besides that, I experienced a strange issue in the PMTU Discovery process:
After exchanging the first PMTU probes and fixing the MTU, one node sends a
single burst (when n->mtuprobe == 32)  to the remote.
On the remote side, as soon as this packets are received, the value of
n->mtuprobes is set to 30. On the next run of send_mtu_probes, the value is
set to 31, and nothing else happens; no PMTU packets are sent.
Each time the remote node receives a PMTU Probe, the value of n->mtuprobes
is again set to 30, preventing the node from sending more probes back to the
first node.
That's why the first node runs into a timeout after "pingtimeout" seconds
and the PMTU process starts over again.
Is this behavior a feature or a bug?

Another question regarding "mtu_probe_h":
Can you please explain what this code block does?
	if(!packet->data[0]) {
	    packet->data[0] = 1;
	    send_udppacket(n, packet); 
	} else {


> > In case you don't like my coding style / submission style, please feel 
> > free to incorporate the changes to your repository your own way.

> The coding style is OK, the git-email commit is also fine.
> But you should not combine unrelated changes in one commit.
> If you accidentily did, you can undo the last commit with "git reset
HEAD^"
> (this undoes the commit but doesn't throw away the changes you made to the
files),
> and then commit the individual bits and pieces again.
> Also, in most cases you should just remove lines instead of uncommenting
them, you can find them back in the git history anyway.

I know, and I am sorry. I'm still playing around with git and did not figure
out everything.
Further, I did not intend my commit to be pushed to your repository, I'd
rather have it as a discussion base to notify you about my ideas and
changes.
When you provide public git branches, I think it'd be easier to discuss
changes.

> > In this commit, I also finished rewriting the logging part. Each log 
> > level has now a flag and you can combine log levels you want to log by 
> > ORing the according flags.

> Hm, I don't see that in the commit, only what you write below:
> > CTRL+C enables DEBUG_SCARY_THINGS similar to the old behavior.

Oops, sorry, the changes I mentioned were in a different commit I forgot to
send you.
Please find it attached.
In this commit, I also changed the log time in console output to a
human-readable format.


Best,

Daniel
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 5db4727dc33c31f57a6b6a3af719cda2f17b2812.txt
URL: <http://www.tinc-vpn.org/pipermail/tinc-devel/attachments/20110107/a5f86719/attachment.txt>


More information about the tinc-devel mailing list