Skip to content
This repository has been archived by the owner on Jun 18, 2024. It is now read-only.

scx: Implement scx_bpf_dispatch_cancel() #135

Merged
merged 1 commit into from
Feb 4, 2024
Merged

scx: Implement scx_bpf_dispatch_cancel() #135

merged 1 commit into from
Feb 4, 2024

Conversation

htejun
Copy link
Collaborator

@htejun htejun commented Feb 3, 2024

This can be used to implement interlocking against dequeue. e.g, Let's say there is one dsq per CPU and the scheduler can stall if a task is dispatched to a dsq which doesn't intersect with p->cpus_ptr. In the dispatch path, if you do the following:

if (bpf_cpumask_test_cpu(target_cpu, p->cpus_ptr))
	scx_bpf_dispatch(p, target_dsq, slice, 0);

The task can still end up in the wrong dsq because the cpumask can change between bpf_cpumask_test_cpu() and scx_bpf_dispatch(). However,

scx_bpf_dispatch(p, target_dsq, slice, 0);
if (!bpf_cpumask_test_cpu(target_cpu, p->cpus_ptr))
	scx_bpf_dispatch_cancel();

eliminates the race window - either the set_cpumask path sees the task dispatched or the dispatch path sees the updated cpumask. In both cases, the dispatch gets canceled as it should.

Note that in schedules which don't implement .dequeue(), depending on the implementation, there can be multiple enqueued instances of the same task. However, as long as the scheduler guarantees that there is at least one dispatch attempt on the task since the last enqueueing and that dispatch attempt interlocks against the dequeue path as above, it's guaranteed to not act upon stale information or miss dispatching an enqueued task.

Cc: Andrea Righi [email protected]

This can be used to implement interlocking against dequeue. e.g, Let's say
there is one dsq per CPU and the scheduler can stall if a task is dispatched
to a dsq which doesn't intersect with p->cpus_ptr. In the dispatch path, if
you do the following:

	if (bpf_cpumask_test_cpu(target_cpu, p->cpus_ptr))
		scx_bpf_dispatch(p, target_dsq, slice, 0);

The task can still end up in the wrong dsq because the cpumask can change
between bpf_cpumask_test_cpu() and scx_bpf_dispatch(). However,

	scx_bpf_dispatch(p, target_dsq, slice, 0);
	if (!bpf_cpumask_test_cpu(target_cpu, p->cpus_ptr))
		scx_bpf_dispatch_cancel();

eliminates the race window - either the set_cpumask path sees the task
dispatched or the dispatch path sees the updated cpumask. In both cases, the
dispatch gets canceled as it should.

Note that in schedules which don't implement .dequeue(), depending on the
implementation, there can be multiple enqueued instances of the same task.
However, as long as the scheduler guarantees that there is at least one
dispatch attempt on the task since the last enqueueing and that dispatch
attempt interlocks against the dequeue path as above, it's guaranteed to not
act upon stale information or miss dispatching an enqueued task.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Andrea Righi <[email protected]>
@htejun htejun requested review from arighi and Byte-Lab February 3, 2024 19:37
Copy link
Collaborator

@arighi arighi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks for doing this @htejun .

One question, theoretically can we have the same race if we do direct dispatch from .enqueue() or .select_cpu() targeting a CPU DSQ instead of using SCX_DSQ_LOCAL?

arighi pushed a commit to sched-ext/scx that referenced this pull request Feb 4, 2024
Use scx_bpf_dispatch_cancel() to invalidate dispatches on wrong per-CPU
DSQ, due to cpumask race conditions.

Link: sched-ext/sched_ext#135
Signed-off-by: Andrea Righi <[email protected]>
@htejun
Copy link
Collaborator Author

htejun commented Feb 4, 2024

One question, theoretically can we have the same race if we do direct dispatch from .enqueue() or .select_cpu() targeting a CPU DSQ instead of using SCX_DSQ_LOCAL?

We're holding p->pi_lock across .select_cpu() and .enqueue(), so we're atomic against any attribute changes. We should be okay.

@htejun htejun merged commit 1ecced8 into sched_ext Feb 4, 2024
2 checks passed
arighi pushed a commit to sched-ext/scx that referenced this pull request Feb 8, 2024
Use scx_bpf_dispatch_cancel() to invalidate dispatches on wrong per-CPU
DSQ, due to cpumask race conditions.

This prevents dispatching tasks to CPU that cannot be used according to
the task's cpumask.

With this applied the scheduler passed all the `stress-ng --race-sched`
stress tests.

Link: sched-ext/sched_ext#135
Signed-off-by: Andrea Righi <[email protected]>
arighi pushed a commit to sched-ext/scx that referenced this pull request Feb 8, 2024
Use scx_bpf_dispatch_cancel() to invalidate dispatches on wrong per-CPU
DSQ, due to cpumask race conditions, and redirect them to the shared
DSQ.

This prevents dispatching tasks to CPU that cannot be used according to
the task's cpumask.

With this applied the scheduler passed all the `stress-ng --race-sched`
stress tests.

Moreover, introduce a counter that is periodically reported to stdout as
an additional statistic, that can be helpful for debugging.

Link: sched-ext/sched_ext#135
Signed-off-by: Andrea Righi <[email protected]>
arighi pushed a commit to sched-ext/scx that referenced this pull request Feb 8, 2024
Use scx_bpf_dispatch_cancel() to invalidate dispatches on wrong per-CPU
DSQ, due to cpumask race conditions, and redirect them to the shared
DSQ.

This prevents dispatching tasks to CPU that cannot be used according to
the task's cpumask.

With this applied the scheduler passed all the `stress-ng --race-sched`
stress tests.

Moreover, introduce a counter that is periodically reported to stdout as
an additional statistic, that can be helpful for debugging.

Link: sched-ext/sched_ext#135
Signed-off-by: Andrea Righi <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants