core: fix poll for event loop (i hope)

This partially reverts c57b705, the previous fix for #1345 The previous fix falsely assumed if `x_poll_for_event` returned nothing, it must have not read from the X connection. This is just not true. It worked, probably because it's impossible to have vblank events close together (they are only once per frame). But that's an assumption we probably shouldn't make. _Separately_, we have another problem. We assumed if `x_poll_for_event` returned nothing, we don't need to flush again. This is not true either. There could be pending requests being completed, whose callback function might have sent new requests. This, for example, causes `picom-inspect` to hang, because some requests are stuck in the outgoing buffer. The explanation for the fix is going to be long, because this kind of problems are never simple. First of all, we need a version of `x_poll_for_event` that actually guarantees to not read from X. The current version of `x_poll_for_event` is already extremely complex, I don't want to pile more complexity on top of it. So instead we are now using a different approach, one some might consider a ugly hack. The complexity of `x_poll_for_event` all stems from the lack of `xcb_poll_for_queued_{reply,special_event}` API, so we had to use complex to merge message from different streams (events, special events, replies, and errors) all the while trying our best (and fail) to handle all messages in the xcb buffer before going to sleep. There is a `xcb_poll_for_queued_event` which in theory can be used for this and will make things a lot simpler, the problem is we don't control events, so if there is a large gap between event arrivals, picom would just be sitting there doing nothing. And that's exactly what we do in this commit, that is, controlling events. Every time we wait for replies/errors for some requests, we send a request that are guaranteed to _fail_. This is because unchecked errors will be returned by `xcb_poll_for_queued_event` as an event. This way, we can be sure an event will be received in a bounded amount of time, so we won't hang. This vastly simplifies the logic in `x_poll_for_event`, and made adding a no-read version of it trivial. Now we have that, some care need to be taken to make sure that we _do_ read from X sometimes, otherwise we will go to sleep without reading anything, and be woken up again immediately by the file descriptor. This will result in an infinite loop. This has some ripple effects, for example, now we queue redraw whenever the wm tree changes; before, redraws were only queued when the wm tree becomes consistent. Even with this, we still need the `while(true)` loop in `handle_x_events`, this is because of the other problem we mentioned at the beginning - flushing. The way we fix the flushing problem is by tracking the completion of pending requests - if any requests were completed, we flush conservatively (meaning even when not strictly necessary) - simple enough. But flushing could read (yes, read) from X too! So whenever we flush, we need to call `x_poll_for_event` again, hence the while loop. Fixes: #1345 Signed-off-by: Yuxuan Shui <[email protected]>
yshui · Oct 12, 2024 · 1115ce6 · 1115ce6
1 parent 729ea65
commit 1115ce6
Show file tree

Hide file tree

Showing 5 changed files with 132 additions and 203 deletions.
diff --git a/src/picom.c b/src/picom.c
@@ -1449,8 +1449,7 @@ static void unredirect(session_t *ps) {
 /// keeps an internal queue of events, so we have to be 100% sure no events are
 /// left in that queue before we go to sleep.
 static void handle_x_events(struct session *ps) {
-	bool wm_was_consistent = wm_is_consistent(ps->wm);
-
+	uint32_t latest_completed = ps->c.latest_completed_request;
 	while (true) {
 		// Flush because if we go into sleep when there is still requests
 		// in the outgoing buffer, they will not be sent for an indefinite
@@ -1460,47 +1459,44 @@ static void handle_x_events(struct session *ps) {
 		// Also note, `xcb_flush`/`XFlush` may _read_ more events from the server
 		// (yes, this is ridiculous, I know). But since we are polling for events
 		// in a loop, this should be fine - if we read events here, they will be
-		// handled below; and if some requests is sent later in this loop, which
-		// means some events must have been received, we will loop once more and
-		// get here to flush them.
+		// handled below; and if some requests is sent later in this loop, we will
+		// set `needs_flush` and loop once more and get here to flush them.
 		XFlush(ps->c.dpy);
 		xcb_flush(ps->c.c);
 
-		// We have to check for vblank events (i.e. special xcb events) and normal
-		// events in a loop. This is because both `xcb_poll_for_event` and
-		// `xcb_poll_for_special_event` will read data from the X connection and
-		// put it in a buffer. So whichever one we call last, say
-		// `xcb_poll_for_special_event`, will read data into the buffer that might
-		// contain events that `xcb_poll_for_event` should handle, and vice versa.
-		// This causes us to go into sleep with events in the buffer.
-		//
-		// We have to keep calling both of them until neither of them return any
-		// events.
-		bool has_events = false;
+		// If we send any new requests, we should loop again to flush them. There
+		// is no direct way to do this from xcb. So if we called `ev_handle`, or
+		// if any pending requests were completed, we conservatively loop again.
+		bool needs_flush = false;
 		if (ps->vblank_scheduler) {
-			has_events = vblank_handle_x_events(ps->vblank_scheduler) ==
-			             VBLANK_HANDLE_X_EVENTS_OK;
+			vblank_handle_x_events(ps->vblank_scheduler);
 		}
 
 		xcb_generic_event_t *ev;
-		while ((ev = x_poll_for_event(&ps->c))) {
-			has_events = true;
+		while ((ev = x_poll_for_event(&ps->c, true))) {
 			ev_handle(ps, (xcb_generic_event_t *)ev);
+			needs_flush = true;
 			free(ev);
 		};
 
-		if (!has_events) {
+		if (ps->c.latest_completed_request != latest_completed) {
+			needs_flush = true;
+			latest_completed = ps->c.latest_completed_request;
+		}
+
+		if (!needs_flush) {
 			break;
 		}
 	}
+
 	int err = xcb_connection_has_error(ps->c.c);
 	if (err) {
 		log_fatal("X11 server connection broke (error %d)", err);
 		exit(1);
 	}
 
-	if (wm_is_consistent(ps->wm) != wm_was_consistent && !wm_was_consistent) {
-		log_debug("Window tree has just become consistent, queueing redraw.");
+	if (wm_has_tree_changes(ps->wm)) {
+		log_debug("Window tree changed, queueing redraw.");
 		ps->pending_updates = true;
 		queue_redraw(ps);
 	}
@@ -1966,9 +1962,18 @@ static void draw_callback(EV_P_ ev_timer *w, int revents) {
 	}
 }
 
-static void x_event_callback(EV_P attr_unused, ev_io * /*w*/, int revents attr_unused) {
-	// This function is intentionally left blank, events are actually read and handled
-	// in the ev_prepare listener.
+static void x_event_callback(EV_P attr_unused, ev_io *w, int revents attr_unused) {
+	// Make sure the X connection is being read from at least once every time
+	// we woke up because of readability of the X connection.
+	//
+	// `handle_x_events` is not guaranteed to read from X, so if we don't do
+	// it here we could dead lock.
+	struct session *ps = session_ptr(w, xiow);
+	auto ev = x_poll_for_event(&ps->c, false);
+	if (ev) {
+		ev_handle(ps, (xcb_generic_event_t *)ev);
+		free(ev);
+	}
 }
 
 static void config_file_change_cb(void *_ps) {

diff --git a/src/vblank.c b/src/vblank.c
@@ -69,7 +69,7 @@ struct vblank_scheduler_ops {
 	bool (*init)(struct vblank_scheduler *self);
 	void (*deinit)(struct vblank_scheduler *self);
 	bool (*schedule)(struct vblank_scheduler *self);
-	enum vblank_handle_x_events_result (*handle_x_events)(struct vblank_scheduler *self);
+	bool (*handle_x_events)(struct vblank_scheduler *self);
 };
 
 static void
@@ -444,14 +444,10 @@ static void handle_present_complete_notify(struct present_vblank_scheduler *self
 	ev_timer_start(self->base.loop, &self->callback_timer);
 }
 
-static enum vblank_handle_x_events_result
-handle_present_events(struct vblank_scheduler *base) {
+static bool handle_present_events(struct vblank_scheduler *base) {
 	auto self = (struct present_vblank_scheduler *)base;
 	xcb_present_generic_event_t *ev;
-	enum vblank_handle_x_events_result result = VBLANK_HANDLE_X_EVENTS_NO_EVENTS;
 	while ((ev = (void *)xcb_poll_for_special_event(base->c->c, self->event))) {
-		result = VBLANK_HANDLE_X_EVENTS_OK;
-
 		if (ev->event != self->event_id) {
 			// This event doesn't have the right event context, it's not meant
 			// for us.
@@ -464,7 +460,7 @@ handle_present_events(struct vblank_scheduler *base) {
 	next:
 		free(ev);
 	}
-	return result;
+	return true;
 }
 
 static const struct vblank_scheduler_ops vblank_scheduler_ops[LAST_VBLANK_SCHEDULER] = {
@@ -575,11 +571,11 @@ vblank_scheduler_new(struct ev_loop *loop, struct x_connection *c, xcb_window_t
 	return self;
 }
 
-enum vblank_handle_x_events_result vblank_handle_x_events(struct vblank_scheduler *self) {
+bool vblank_handle_x_events(struct vblank_scheduler *self) {
 	assert(self->type < LAST_VBLANK_SCHEDULER);
 	auto fn = vblank_scheduler_ops[self->type].handle_x_events;
 	if (fn != NULL) {
 		return fn(self);
 	}
-	return VBLANK_HANDLE_X_EVENTS_NO_EVENTS;
+	return true;
 }
diff --git a/src/vblank.h b/src/vblank.h
@@ -29,12 +29,6 @@ enum vblank_callback_action {
 	VBLANK_CALLBACK_DONE,
 };
 
-enum vblank_handle_x_events_result {
-	VBLANK_HANDLE_X_EVENTS_OK,
-	VBLANK_HANDLE_X_EVENTS_ERROR,
-	VBLANK_HANDLE_X_EVENTS_NO_EVENTS,
-};
-
 typedef enum vblank_callback_action (*vblank_callback_t)(struct vblank_event *event,
                                                          void *user_data);
 
@@ -53,4 +47,4 @@ vblank_scheduler_new(struct ev_loop *loop, struct x_connection *c, xcb_window_t
                      enum vblank_scheduler_type type, bool use_realtime_scheduling);
 void vblank_scheduler_free(struct vblank_scheduler *);
 
-enum vblank_handle_x_events_result vblank_handle_x_events(struct vblank_scheduler *self);
+bool vblank_handle_x_events(struct vblank_scheduler *self);