This repository has been archived by the owner on Jun 18, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 30
scx: Implement scx_bpf_dispatch_cancel() #135
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This can be used to implement interlocking against dequeue. e.g, Let's say there is one dsq per CPU and the scheduler can stall if a task is dispatched to a dsq which doesn't intersect with p->cpus_ptr. In the dispatch path, if you do the following: if (bpf_cpumask_test_cpu(target_cpu, p->cpus_ptr)) scx_bpf_dispatch(p, target_dsq, slice, 0); The task can still end up in the wrong dsq because the cpumask can change between bpf_cpumask_test_cpu() and scx_bpf_dispatch(). However, scx_bpf_dispatch(p, target_dsq, slice, 0); if (!bpf_cpumask_test_cpu(target_cpu, p->cpus_ptr)) scx_bpf_dispatch_cancel(); eliminates the race window - either the set_cpumask path sees the task dispatched or the dispatch path sees the updated cpumask. In both cases, the dispatch gets canceled as it should. Note that in schedules which don't implement .dequeue(), depending on the implementation, there can be multiple enqueued instances of the same task. However, as long as the scheduler guarantees that there is at least one dispatch attempt on the task since the last enqueueing and that dispatch attempt interlocks against the dequeue path as above, it's guaranteed to not act upon stale information or miss dispatching an enqueued task. Signed-off-by: Tejun Heo <[email protected]> Cc: Andrea Righi <[email protected]>
arighi
approved these changes
Feb 4, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks for doing this @htejun .
One question, theoretically can we have the same race if we do direct dispatch from .enqueue()
or .select_cpu()
targeting a CPU DSQ instead of using SCX_DSQ_LOCAL?
arighi
pushed a commit
to sched-ext/scx
that referenced
this pull request
Feb 4, 2024
Use scx_bpf_dispatch_cancel() to invalidate dispatches on wrong per-CPU DSQ, due to cpumask race conditions. Link: sched-ext/sched_ext#135 Signed-off-by: Andrea Righi <[email protected]>
We're holding |
arighi
pushed a commit
to sched-ext/scx
that referenced
this pull request
Feb 8, 2024
Use scx_bpf_dispatch_cancel() to invalidate dispatches on wrong per-CPU DSQ, due to cpumask race conditions. This prevents dispatching tasks to CPU that cannot be used according to the task's cpumask. With this applied the scheduler passed all the `stress-ng --race-sched` stress tests. Link: sched-ext/sched_ext#135 Signed-off-by: Andrea Righi <[email protected]>
arighi
pushed a commit
to sched-ext/scx
that referenced
this pull request
Feb 8, 2024
Use scx_bpf_dispatch_cancel() to invalidate dispatches on wrong per-CPU DSQ, due to cpumask race conditions, and redirect them to the shared DSQ. This prevents dispatching tasks to CPU that cannot be used according to the task's cpumask. With this applied the scheduler passed all the `stress-ng --race-sched` stress tests. Moreover, introduce a counter that is periodically reported to stdout as an additional statistic, that can be helpful for debugging. Link: sched-ext/sched_ext#135 Signed-off-by: Andrea Righi <[email protected]>
arighi
pushed a commit
to sched-ext/scx
that referenced
this pull request
Feb 8, 2024
Use scx_bpf_dispatch_cancel() to invalidate dispatches on wrong per-CPU DSQ, due to cpumask race conditions, and redirect them to the shared DSQ. This prevents dispatching tasks to CPU that cannot be used according to the task's cpumask. With this applied the scheduler passed all the `stress-ng --race-sched` stress tests. Moreover, introduce a counter that is periodically reported to stdout as an additional statistic, that can be helpful for debugging. Link: sched-ext/sched_ext#135 Signed-off-by: Andrea Righi <[email protected]>
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This can be used to implement interlocking against dequeue. e.g, Let's say there is one dsq per CPU and the scheduler can stall if a task is dispatched to a dsq which doesn't intersect with p->cpus_ptr. In the dispatch path, if you do the following:
The task can still end up in the wrong dsq because the cpumask can change between bpf_cpumask_test_cpu() and scx_bpf_dispatch(). However,
eliminates the race window - either the set_cpumask path sees the task dispatched or the dispatch path sees the updated cpumask. In both cases, the dispatch gets canceled as it should.
Note that in schedules which don't implement .dequeue(), depending on the implementation, there can be multiple enqueued instances of the same task. However, as long as the scheduler guarantees that there is at least one dispatch attempt on the task since the last enqueueing and that dispatch attempt interlocks against the dequeue path as above, it's guaranteed to not act upon stale information or miss dispatching an enqueued task.
Cc: Andrea Righi [email protected]