tinc 1.1: missing PONG

Fri Feb 23 23:04:24 CET 2018

I've noticed that on Windows during large transfers (such as an
iperf run), the receiving end sometimes fails to respond to a PING.
When this happens, the sender closes the connection and both ends
have to renegotiate.

My theory is that, due to the properties of the splay tree, the TAP
device event is almost always at the head of the tree and thus if
there is traffic, that event will be serviced at the expense of
meta communication.

Because splay_search() is used to find the first event returned by
WSAWaitForMultipleEvents(), the next time through the loop the last
event serviced will be the first in the events[] array.  This doesn't
happen with the select() version of the event loop.

It is possible to eliminate the splay_search() by using an extra
array to hold io_t pointers but I've had better results just filling
the events[] array in reverse order.  This results in the most
recently serviced event being added to the end of events[], which
should prevent one very busy event from monopolizing the event loop.

With this change I can no longer reproduce the problem of nodes not
responding to pings.

Another solution would to randomly shuffle events[] but that seems
needlessly complicated.

 - todd

diff --git a/src/event.c b/src/event.c
index 331872a..03111d0 100644
--- a/src/event.c
+++ b/src/event.c
@@ -403,11 +403,14 @@ bool event_loop(void) {
 		}
 
 		WSAEVENT *events = xmalloc(event_count * sizeof(*events));
-		DWORD event_index = 0;
+		DWORD event_index = event_count;
 
+		/*
+		 * Fill events[] in reverse order.  Otherwise we may starve
+		 * events other than the TAP, which is usually at the head.
+		 */
 		for splay_each(io_t, io, &io_tree) {
-			events[event_index] = io->event;
-			event_index++;
+			events[--event_index] = io->event;
 		}
 
 		DWORD result = WSAWaitForMultipleEvents(event_count, events, FALSE, timeout_ms, FALSE);