tinc 1.1pre10 slower than tinc 1.0, experimentalProtocol even more

Wed Apr 16 23:37:27 CEST 2014

On Wed, Apr 16, 2014 at 10:23:23PM +0200, Henning Verbeek wrote:

> > > sptps_speed reports:
[...]
> > > SPTPS/UDP transmit for 10 seconds:           8.64 Gbit/s
> >
> Here's the output for sptps_speed on ChaCha-Poly1305. As you
> predicted, the throughput is lower:
[...]
> SPTPS/UDP transmit for 10 seconds:           2.52 Gbit/s

Not bad :)

> With ChaCha-Poly1305 on SPTPS, we're now seeing between 300Mbit/s and
> 320Mbit/s only. Again both tincd's are fully CPU bound.
> 'tinc info <node>' reports:
> Node:         riak_ankara
> Address:      <external IP> port 656
> Online since: 2014-04-16 21:39:08
> Status:       validkey visited reachable sptps udp_confirmed
> Options:      pmtu_discovery clamp_mss
> Protocol:     17.3
> Reachability: directly with UDP
> PMTU:         1451
> Edges:        riak_belfast
> Subnets:      <internal subnet>
> 
> Everything here as it should be?

Yes. However, I already found a bug; even though the PMTU discovery worked
correctly, tinc 1.1 still allows slightly larger packets, which will then be
sent via TCP instead of UDP. This causes the bad performance you see. I will
fix that this weekend, I hope.

> > I do hope that when all issues have been resolved and tinc 1.1.0 can be
> > released, actual throughput is much closer to the throughput measured by
> > sptps_speed. Also, at the moment, both tinc and sptps_speed are single-threaded, so
> > on a multi-core machine the throughput could in principle be multiplied by the
> > number of cores, however that only makes sense if the encryption and
> > authentication themselves are the bottleneck.
> 
> I'm struggling to comprehend this. If sptps_speed reports one value,
> and what I measure through an actual sptps-tunnel is another, and in
> both cases only a single core is used, what "ate up" all the
> throughput? Is it the tun/tap handling as you suggested? Is it the
> network device driver? Is it the latency of the actual packet over the
> wire?

I've been doing some tests with the sptps_test utility. I have two nodes for
which sptps_speed gives:

SPTPS/UDP transmit for 10 seconds:           1.64 Gbit/s

If I then generate keys with sptps_keypair, and start sptps_test on the two
nodes as follows:

node1: sptps_test privkey1 pubkey2 9999 > /dev/null
node2: sptps_test privkey2 pubkey1 node1 9999 < /dev/zero

Then I see that it is completely saturating the 1 Gbit/s link between them, and
CPU utilization is at 90% on the slowest node.

However, I added an option in sptps_test to use a tun device instead of
standard input/output, and if I then use that to configure a simple
point-to-point VPN, then the throughput drops to 500 Mbit/s, with 100% CPU
utilization on the slowest node (an i3-3220T at 2.8 GHz). This would then
suggest that tun/tap handling is a bottleneck. I'm not surprised by that, it
does involve a lot of context switches between tinc, the kernel and the program
generating the actual network load (like iperf).

-- 
Met vriendelijke groet / with kind regards,
     Guus Sliepen <guus at tinc-vpn.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://www.tinc-vpn.org/pipermail/tinc/attachments/20140416/4e542967/attachment.sig>