This repository has been archived by the owner on Jun 18, 2024. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
scx: Update conditions for WAKE_SYNC migration
In scx_select_cpu_dfl(), we currently migrate the waking task to the CPU of the waker in the following scenario: 1. WAKE_SYNC is specified in wake_flags 2. There is at least one idle core in the system 3. The wakee can run on the waker CPU The assumption implicit with (2) is that the system is under saturated, and that therefore the wakee's runqueue delay would not be impacted by migrating to the waker's CPU rather than migrating to an idle core. This doesn't always happen in practice though. Consider the following scenario: 1. The system is overloaded, and at least one core becomes idle 2. Some groups of pairs of tasks that communicate over IPC are spawned. 3. Sender tasks are running on cores that still have enqueued tasks from when the system was overloaded, and they repeatedly wake waker tasks with WAKE_SYNC. 4. The waker tasks observe that the system is underloaded, and so think that it's optimal for the wakee to be migrated to their CPU despite having a deep runqueue. This can cause serious performance regressions for such workloads. For example, hackbench regresses by nearly 10x relative to EEVDF: [1]+ ./scx_simple > /dev/null 2> /dev/null & [root@virtme-ng bin]# hackbench --loops 1000 Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks) Each sender will pass 1000 messages of 100 bytes Time: 2.944 [root@virtme-ng bin]# fg ./scx_simple > /dev/null 2> /dev/null ^C [root@virtme-ng bin]# hackbench --loops 1000 Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks) Each sender will pass 1000 messages of 100 bytes Time: 0.345 What we really want is to only migrate to the waker CPU if nobody else is already enqueued there. This will cause tasks to fan out over any idle CPUs when they're available if the waker's rq is overloaded, and then eventually to start enjoying wakeups on the waker's CPU once load has been distributed and tasks are no longer piling up on a subset of cores. With this patch, the regression is addressed: [root@virtme-ng bin]# ./scx_simple > /dev/null & [1] 336 [root@virtme-ng bin]# hackbench --loops 1000 Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks) Each sender will pass 1000 messages of 100 bytes Time: 0.348 [root@virtme-ng bin]# fg ./scx_simple > /dev/null ^CEXIT: BPF scheduler unregistered [root@virtme-ng bin]# hackbench --loops 1000 Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks) Each sender will pass 1000 messages of 100 bytes Time: 0.352 Signed-off-by: David Vernet <[email protected]>
- Loading branch information