Tweaks for high-bandwidth tinc

Fri Nov 12 19:43:41 CET 2010

[Sorry for the slow response, had a lot of related issues to deal with recently]

On Sat, Oct 23, 2010 at 5:10 PM, Guus Sliepen <guus at tinc-vpn.org> wrote:
> On Sat, Oct 23, 2010 at 11:19:30AM -0500, Brandon Black wrote:
>
>> I've been using tinc to do some high bandwidth VPNs [...]
>
> How high is the bandwidth exactly?

The scenario is pretty extreme, 100-250Mbps bandwidth spread over
~45-60K long-lived TCP sessions.

>> 1) In tinc-up, users will want to adjust the txqueuelen of the tunnel
>> device as appropriate with ifconfig.  The default it 500, which
>> resulted in tons of overruns for me.  After trying 2000 for a while
>> and still seeing overruns, I went with a value of 10000, which seems
>> to be working well.
>
> Is the traffic very bursty? Normally, any TCP traffic inside the tunnel should
> adapt its bandwidth to that available, this means it also should not become
> higher than tinc can handle. So if traffic is very regular, one wouldn't expect
> many packets being queued in the tun device...

With so many parallel TCP connections, it's hard for normal congestion
control to react as quickly as it should, especially when we have
several thousand connections decide to all burst a few packets
simultaneously, but they're otherwise pretty low bandwidth per
connection.  Since there are multiple backend application servers
involved, as I said before I split up my tinc config into separate
point-to-point VPNs between the LB and the backends, so now I'm
dealing with only ~9K or so TCP connections inside any given tunnel.

Going to summarize what I've tried with local patches here:

IFF_ONE_QUEUE:

I'm still up in the air as to the utility of this flag.  I made a
patch to make it an experimental configuration parameter in tinc.conf,
and enabling it actually seemed to make matters worse in our
particular scenario.  Reading source code and comments around the net
(re: other VPN solutions that use TUN), it seems like most people
think it's beneficial though, and possibly even enable it by default.
It's probably best left as a experimental config setting defaulting to
off (current behavior) for now until someone really digs into this
issue deeper though.  I can't claim to really understand what's going
on here.

SO_SNDBUF/SO_RCVBUF for the UDP socket:

I haven't actually bothered patching this in yet (because I have a
sysctl workaround for now, it's less urgent to me), but they're
obviously a good idea and I can add them in any patches I send up.

Packet sequence (/loss/replay/etc) issues:

I made a patch that made the late-packet bitmap size configurable, and
increasing it (first to 512 packets, then to 2048) did do wonders for
us.  If you've only got a handful of TCP connections flowing through a
tunnel, given that TCP can only handle out-of-order to small degree
(via SACK, etc), packets outside the default 128 range would be
useless anyways.  However, if you've got many thousands, it's easy for
say 1000 packets to go through the tunnel without any two of them
being from a single TCP session, which really changes the game and
makes you want to save late/early packets well outside the default 128
range in case of reordering.

Another thing we get hit with (thanks to the underlying network at
Amazon under brutal conditions, I think) is isolated packets jumping
the queue by a lot.  E.g. in a given tunnel's sequence, we might see
1-2 packets suddenly arrive hundreds of seqnos ahead of the most
recent seqno, but then they're immediately followed by all of the
"missing" packets (many of which would then be dropped by tinc by
default).  So I also added a patch that makes tinc a little bit more
resilient against this.  When the first far-future packet arrives
(outside the size of the late window), that packet is dropped and the
sequence-tracking is unaffected.  As more far-future packets continue
to arrive (without intervening older traffic we were waiting on), they
continue to be dropped until N of them have been seen, at which point
we give up waiting on the other traffic and advance the sequence
number, etc like before.  N is set to 1/32 of the late-window size (so
4 packets vs with the default window of 128 packets).  This copes much
better when 1 (or a few) packets jump the queue without the rest being
truly lost, and in the case where a large chunk of packets really were
lost, it only extends the amount of dropped packets by N, which is at
most 1/32nd of the originally lost packets.  Seems like a good
tradeoff to me, but you might disagree for the common case.

I eventually realized that this whole late-packet tracking thing is
really all about replay security.  Since I'm using tinc more to get
around routing limitations than for security (I already disabled
encryption and authentication too), for a scenario like mine the ideal
solution is to simply pass all traffic and ignore the sequence numbers
on the receiver side (and not even bother maintaining the late-packet
bitmap).  So I updated my earlier patch to allow the late-packet
window to be set to zero, which disables the related code in the
net_packet receiver (and again improved the situation for us in the
real world).

I'm not sure how much of this you want upstream or how you want it
broken up, but let me know, I can reshuffle my patches a bit and work
out a set of them for these things to send to you tomorrow sometime
(IFF_ONE_QUEUE for Linux, socket buffer sizing, configurable
late-packet window, queue-jumper resilience, option to disable replay
prevention completely).  The latter 3 all build on each other
currently, but could be re-ordered any other way if you don't like the
idea of some of them (e.g. could have a param to disable replay
prevention completely without the other two).

Thanks,
-- Brandon