Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions regarding the problem with atomics #9

Open
Thom-de-Jong opened this issue Nov 14, 2024 · 4 comments
Open

Questions regarding the problem with atomics #9

Thom-de-Jong opened this issue Nov 14, 2024 · 4 comments
Assignees

Comments

@Thom-de-Jong
Copy link

Thom-de-Jong commented Nov 14, 2024

Good day all,

Not an issue but a question: What exactly is the problem with atomics?
I have read the discussion #59

Is this confirmed by WCH, or by the test mentioned in #59?
Is this a hardware bug in all chips?
How does one know it is the atomic operation causing a problem?
Is it enough to put the atomic operation in a critical section by using the portable-atomics crate for example?

Just curious about the recent changes.
Thank you in advance for the explanation!

@Codetector1374
Copy link
Member

Codetector1374 commented Nov 14, 2024

Yeah, we never really had time to run the machine test suite. But we did run into an actual / real problem with embassy executor. Where the executor enter a bad state, the task is marked as woken up but not in the ready list.

@Dummyc0m and I have reviewed the code in the embassy executor and can not come up with reasonable possibility for us to be entering that state if load reserve store conditional actually work as expected. And by disabling atomic (it will fall back to use the critical section based executor data structures) we can no longer reproduce the problem as expected.

Regarding your question, using portable atomic and placing access into critical section would solve the problem. Because we are practically no long use lr/sc instructions anymore.

If anyone have the time / interest in running through the litmus test suite to actually confirm it would be extra nice.

@Usioumeo
Copy link

I tried to run the litmus test suite. The problem is that it is not meant to run on bare metal, and it requires threads and a real operating system underneath.
The best I can think of is to parse the litmus tests with a new parser, generate some code (even rust based on this lib, with the test included as embedded assembly), and to simulate traps by threads we could use timer interrupts. The problem with this approach is that interrupts have a priority, so one "thread" will always be interrupted, while the other will never be.
We could swap after a test run, but I don't know if that's a good idea for all tests.

Let me know what you think, and if it is worth spend time on it.

@Codetector1374
Copy link
Member

I was thinking adapting something like free rtos and slap it on there as a few threads with preemptive scheduling. I personally don’t really have a huge interest in confirming it is broken, since losing atomic on an embedded & single core platform like this is honestly not a huge deal. And we confirmed by using critical section we can get embassy to work as expected .

@Thom-de-Jong
Copy link
Author

@Codetector1374, Thank you for the explanation.

For me, not using atomics solves all the instability I encountered in my firmware on the CH32V203.
Atomics make things easy but it is not a dealbreaker as portable-atomics will fill the gap in my case. (And if you are open to depend on an extra crate)
I'm not experienced enough to help you further investigate this issue and would have never guessed the cause myself.
Huge thanks to you all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants