Wepoll loses events #35

djelinski · 2023-12-04T18:17:38Z

Background

We use two epoll handles; One is used only for EPOLLIN events, the other is used for EPOLLOUT. Both are always used with EPOLLONESHOT.

Each epoll handle has a dedicated thread that runs epoll_wait in a loop, and schedules tasks to run in other threads in response to polled events.

Other threads register interest ops with EPOLL_CTL_ADD, and when the event is polled, deregister interest with EPOLL_CTL_DEL. Any given socket handle can be registered at most once with each of the above epoll handles, once for reading, once for writing.

The problem

Occasionally one of the events doesn't fire, even if the wait conditions are satisfied. When that happens, the socket handle is still registered in port_state->sock_tree in state SOCK__POLL_PENDING.

This happens frequently on Windows 2016, less frequently on Windows 2019 and Windows 10, we haven't observed it yet with Windows 11 or 2022. The problem is reproducible with a piece of code that uses multiple UDP sockets talking over the loopback interface, and on Windows 2016 it usually reproduces within the first few minutes of run time. We haven't been able to produce a minimal reproducer yet.

I'm not sure if it's related, but there's some cross-talk when polling the same socket with 2 epoll handles; I added some extra logging, and found that we frequently receive AFD_POLL_SEND event on the EPOLLIN handle. This happens on all systems, including Windows 11.

How can I troubleshoot it further?

piscisaureus · 2023-12-04T21:27:11Z

Your use case as you describe it should work, and I can't think of any potential gotchas in wepoll itself that could cause the behavior you're describing. The question is whether this is a bug in wepoll, in your code or in Windows.

Here's what I would recommend:

See if you can write a minimal reproduction that uses a single socket two threads (one for sending and one for receiving). You could try to modify https://github.com/piscisaureus/wepoll/blob/master/test/test-multi-poll.c to explicitly test this scenario (here is ChatGPT's concoction). If you can reproduce it, it's not a bug in your code.
Modify the test case from step one 1 to use select() instead of epoll_wait() and see if the problem persists. If it does, it is a bug in windows, otherwise it must be a problem with wepoll.

djelinski · 2023-12-07T08:02:28Z

After a bit of trial and error, managed to create a fairly small reproducer, posted here: https://gist.github.com/djelinski/d4e8456c197576c355100b25266b9cdd
The program repeatedly exchanges messages between a set of UDP sockets, printing the status every 5 seconds. On working systems (like Windows 11), it only prints lines like:

Tasks polled: 2/24, waiting 0/0, active 0/0

On failing systems (Windows 2016 / 2019), the output is something like:

Tasks polled: 1/18, waiting 0/0, active 1/6
Input task 0 inactive for 5000 milliseconds
Output task 1 inactive for 5000 milliseconds
Tasks polled: 1/23, waiting 0/0, active 1/1
Input task 0 inactive for 10000 milliseconds
Output task 1 inactive for 10000 milliseconds

Removing and re-adding interest set enables the program to make progress.

select() does not offer equivalent functionality - it's not possible to add interest sets to a running select(). How else can we troubleshoot this?

djelinski · 2023-12-08T16:17:09Z

I created another reproducer that re-adds a socket to the interest set if we detect that no events were reported for over 5 seconds. The event is immediately reported, but another event on the same socket is lost soon after that.

The events are only lost on sockets that are polled for read and write at the same time; sockets that only send or only receive do not lose events.

Adding a global critical section instead of per-port critical section in wepoll does not change the outcome, events are still lost.

On the other hand, if we use a socket from the same thread at all times, the problem does not reproduce.

The problem is also reproducible on Windows 11 with TCP sockets; it does not reproduce with UDP sockets because writes to UDP sockets never block on Windows 11.

djelinski · 2023-12-11T14:47:47Z

I modified the reproducer to use EPOLL_CTL_MOD instead of EPOLL_CTL_DEL + EPOLL_CTL_ADD, and the issue no longer reproduces. The updated code can be found here:
https://gist.github.com/djelinski/5f7ef3b12843fed401a52c63ad11b16b

@piscisaureus let me know if this is enough information to address this issue

djelinski · 2024-01-04T14:08:18Z

Created a TCP-based reproducer. Unlike the UDP-based one, the TCP one also fails on more modern Windows versions like Windows 11 or Windows 2022. The reproducer can be found here:
https://gist.github.com/djelinski/4f40d0c6908646299c0fd5d809c417b6

We are using EPOLL_CTL_DEL + ADD because of #34. We will probably invent a different workaround for #34, but we would like to understand what is going wrong here.

piscisaureus · 2024-01-04T20:41:06Z

@djelinski

I suspect the root cause is something along these lines:

I have observed in the past that when AFD_POLL operations are pending at the same time, and they're all associated with the same completion port, where the same socket appears on more than one poll set, events can get lost.
Normally this can't happen because a socket is associated with a poll group when it is added and each poll group can only have one AFD_POLL operation outstanding at any time.
When a ONESHOT event fires, wepoll will set itself up to keep polling for AFD_POLL_LOCAL_CLOSE on this socket to aid auto-removal from the epoll set (emulating the linux behavior that sockets that are closed are automatically deleted from the epoll set).
Because multiple threads are polling the same epoll port, the AFD_POLL_LOCAL_CLOSE poll is immediately added to the poll set for the poll group and a new AFD_POLL operation is started. If there was no other thread polling the same port at that time, starting the new AFD_POLL operation would be deferred until the next call to epoll_wait().
When EPOLL_CTL_DEL is called, the socket is removed from the tree that maintains the association between a socket and its poll group, however an AFD_POLL operation against it might still be pending. This is normally not a problem.
Right EPOLL_CTL_ADD is called immediately after, it will add the socket back to the tree and find a new poll group for it. This could result is selecting a different poll group, so now we end up in a situation where the same socket is polled by multiple poll groups.

It might be possible to solve the problem by keeping deleted sockets in the socket tree (rather than in a specialized deleted-socket queue on the side) until its AFD_POLL operation completes, so that if it is deleted and re-added the poll group association can be maintained. However one potential problem I see with this is that if the user actually closes the socket after deleting it from the epoll set, and creates another socket, and windows reuses the same socket handle, wepoll might not adequately resubmit the poll operation when it should.

djelinski · 2024-01-05T21:39:33Z

@piscisaureus thanks for your reply.

Will EPOLL_CTL_ADD create a new group if the original is not full? I thought it only creates a new group if all existing ones monitor 32 sockets already; here we only monitor 2 sockets at a time.

(also, we have exactly one thread calling epoll_wait per epoll handle, but that probably doesn't matter here)

piscisaureus · 2024-01-05T22:46:12Z

also, we have exactly one thread calling epoll_wait per epoll handle, but that probably doesn't matter here

~~I think I don't understand what you're trying to say. When I log the thread IDs that are calling epoll_wait() I see many of them.~~ Apologies, misread that. There is one poller thread and a bunch of threads updating the poll sets.

Also, when I comment out this line, the bug no longer reproduces. This code path is only hit when multiple threads are polling the same epoll handle at the same time.

djelinski · 2024-01-08T08:07:49Z

right. That line is also called when the epoll handle is updated while another thread is waiting on it, which is exactly what happens here.

We can't really comment out this line, as then the updates would only be applied on the next epoll_wait call, which can happen much later (or never).

One other interesting thing I noticed is that sometimes we get uninteresting events from afd_poll. Specifically, if I add the following lines:

  if ((epoll_events & sock_state->user_events)== 0) {
    printf(poll %lld (handle %lld) succeeded with uninteresting events %x vs interest %x\n",
        sock_state->base_socket, (intptr_t)port_state->iocp_handle, epoll_events, sock_state->user_events);
  }
  /* Filter out events that the user didn't ask for. */

before these, I usually (but not always) get logs like the following:

poll 228 (handle 236) succeeded with uninteresting events 41 vs interest 8000001c
poll 224 (handle 236) succeeded with uninteresting events 41 vs interest 8000001c

This seems to suggest that AFD is somehow mixing up the interest sets.

If we never poll the same socket handle from 2 distinct epoll handles at the same time, the problem doesn't reproduce.

djelinski · 2024-01-11T09:24:44Z

It looks like AFD sometimes mixes up different IO status blocks related to the same socket handle; I added more logging, and found that sometimes when a socket gets stuck, GetCompletionStatusEx returns an event that was intended for another poll handle. In other words, if we have 2 outstanding requests for a socket, one for EPOLLIN, the other one for EPOLLOUT, sometimes EPOLLOUT event is delivered to the request waiting for EPOLLIN, or vice versa. When that happens, the other request never completes.

The requests necessarily use different completion ports and different AFD device handles. As an experiment, I changed afd_create_device_handle to use unique names for each AFD device (\\Device\\Afd\\Wepoll_<iocp_handle>), but that did not fix the problem.

In another experiment I changed the per-port critical sections to a global critical section to make sure that all epoll_ctl_add operations are globally serialized. That didn't fix the problem either, which suggests that the problem might be a race between epoll_ctl_add and GetQueuedCompletionStatusEx, which is run outside of the critical section.

As mentioned earlier, using EPOLL_CTL_MOD seems to fix this issue. With EPOLL_CTL_MOD, requests are only added to the poll set by the thread that calls GetQueuedCompletionStatusEx, which further confirms the hypothesis.

As suggested above, commenting out port__update_events_if_polling might also help with this issue; it does not fix the code samples provided - EPOLL_CTL_ADD relies on a functional port__update_events_if_polling, but we might be able to modify the code to reissue epoll_wait after every EPOLL_CTL_ADD.

With the above in mind, I think the remaining options worth exploring are:

always use the same epoll handle for a given socket handle
never use EPOLL_CTL_ADD while epoll_wait is running

did I miss anything?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wepoll loses events #35

Wepoll loses events #35

djelinski commented Dec 4, 2023

piscisaureus commented Dec 4, 2023

djelinski commented Dec 7, 2023

djelinski commented Dec 8, 2023

djelinski commented Dec 11, 2023

djelinski commented Jan 4, 2024

piscisaureus commented Jan 4, 2024

djelinski commented Jan 5, 2024

piscisaureus commented Jan 5, 2024 •

edited

Loading

djelinski commented Jan 8, 2024

djelinski commented Jan 11, 2024

Wepoll loses events #35

Wepoll loses events #35

Comments

djelinski commented Dec 4, 2023

Background

The problem

piscisaureus commented Dec 4, 2023

djelinski commented Dec 7, 2023

djelinski commented Dec 8, 2023

djelinski commented Dec 11, 2023

djelinski commented Jan 4, 2024

piscisaureus commented Jan 4, 2024

djelinski commented Jan 5, 2024

piscisaureus commented Jan 5, 2024 • edited Loading

djelinski commented Jan 8, 2024

djelinski commented Jan 11, 2024

piscisaureus commented Jan 5, 2024 •

edited

Loading