Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduling: functions that might reschedule assume interrupts enabled #20941

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

derMihai
Copy link
Contributor

@derMihai derMihai commented Oct 24, 2024

Contribution description

Currently, most functions that might reschedule contain logic that enables them to be called with interrupts disabled. This is not only going against the scope disabling interrupts in the first place, but also doesn't work. All these functions end up calling thread_yield_higher(), which will fail to suspend the current thread.

Depending on the use-case, this can be more or less catastrophic. If the tread doesn't go to sleep, thread_yield_higher() will just trigger the switch once the interrupts are enabled (although on some architectures this might be bad due to unflushed instruction caches - see the cortex m code). But if the thread is set to go to sleep, the current thread will continue execution although it shouldn't. For example, the following assertion will be triggered:

    static cond_t cond = COND_INIT;
    static mutex_t lock = MUTEX_INIT;

    mutex_lock(&lock);
    irq_disable();
    cond_wait(&cond, &lock); // this should block forever
    assert(0); // thread_yield_higher() couldn't trigger the interrupt

This breaks both program logic and leaves the scheduler in a messed up state.

⚠️ For this PR, I simply grepped for thread_yield and removed any IRQ state saving logic wherever thread_yield*() was found and the code is clearly not allowed to be executed from IRQ context, so don't expect it to be comprehensive. There might still be places that I missed and still feature useless IRQ state saving logic.
I also added an assertion in thread_yield_higher() s.t. if it's executed in thread context, the interrupts must be enabled.

The case of the mutex

mutex_lock() really shouldn't be called with interrupts disabled, as it reschedules. However, it's fine to do it if we know there is no thread blocking on that mutex. This is the case of vfs_bind_stdio(), which is called in early init, with interrupts disabled.

I therefore left the mutex the way it is. The assertion in thread_yield_higher() will still catch a reschedule.

Testing procedure

I run the automatic tests on native, and I get some failures (see comments).

Issues/PRs references

#20940

@github-actions github-actions bot added Area: core Area: RIOT kernel. Handle PRs marked with this with care! Area: timers Area: timer subsystems Area: sys Area: System labels Oct 24, 2024
@derMihai
Copy link
Contributor Author

I do realize that for some of the functions, it does make sense to store and restore interrupt state. These are the ones that may be called interrupt context (e.g. cond_signal()). They are already guarded against any yielding, and may be called with disabled interrupts.

@derMihai derMihai marked this pull request as draft October 24, 2024 14:09
@github-actions github-actions bot added Platform: native Platform: This PR/issue effects the native platform Platform: MSP Platform: This PR/issue effects MSP-based platforms Platform: ARM Platform: This PR/issue effects ARM-based platforms Platform: AVR Platform: This PR/issue effects AVR-based platforms Platform: RISC-V Platform: This PR/issue effects RISC-V-based platforms Area: cpu Area: CPU/MCU ports labels Oct 25, 2024
@derMihai
Copy link
Contributor Author

@kaspar030 what is the scope of this test?

@derMihai
Copy link
Contributor Author

event_source_trigger() is calling event_post() with interrupts disabled and event_post() might reschedule. The locking could be done with a mutex instead, but then it won't be usable from interrupt context.

@github-actions github-actions bot removed Area: timers Area: timer subsystems Area: sys Area: System labels Oct 25, 2024
@derMihai
Copy link
Contributor Author

Run the native tests on native and samd20-xpro. tests/core/thread_priority_inversion and tests/sys/event_source failed as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: core Area: RIOT kernel. Handle PRs marked with this with care! Area: cpu Area: CPU/MCU ports Platform: ARM Platform: This PR/issue effects ARM-based platforms Platform: AVR Platform: This PR/issue effects AVR-based platforms Platform: MSP Platform: This PR/issue effects MSP-based platforms Platform: native Platform: This PR/issue effects the native platform Platform: RISC-V Platform: This PR/issue effects RISC-V-based platforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant