-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
starting from kernel 5.10.x latency test fails systematically on multiple platforms #285
Comments
It may not be an audio issue at all. Some other task / kernel driver may eat CPU time preventing the low-latency operation. From the result:
It means that the playback is missing 88 samples (216-128) at the under-run check. You can trace the syscalls using
Legend: I marked with Perhaps, it may be worth to do tests the the vanilla kernel and do a bisect to the commit which affects this behavior. Another way is to do the kernel profiling. |
Thanks for the info. |
I did some debugging using ftrace (trace-cmd) and it really seems that the system scheduler blocks the latency task by a kworker task:
As you can see, the latency task was rescheduled at 22719.778949 and system gave the CPU back at 22719.790242 . So the time difference is 0.011292 seconds (~11.3ms). @tiwai: FYI - could you check my interpretation, please? It really appears like a problem in the linux scheduler or the kworker task. |
I think that I found the upstream change between 5.9 and 5.10 kernels in the scheduler which causes this behavior: See the patch description. You can set the old behaviour using this command:
Note that this command may be dangerous - see the patch description. The task can eat whole CPU preventing to run other tasks. |
For the archiving purposes -
eventually (to check workqueues):
Use |
Great catch ! this seems to work for me. |
Would a different RT setup in latency program change the behavior? It sounds like a kernel regression if it shows such a latency even with the highest RT priority. |
The latency.c sets the scheduler to round-robin (SCHED_RR) with the maximal priority 99. The mentioned kernel patch allows to interrupt tasks even with this highest latency with some non-realtime tasks and it seems that it does in a large time window (from the realtime perspective) - tenths of milliseconds. From my analysis, the workqueue tasks are fine (up to 1ms), but the scheduler wake the latency (busy-loop task) too late. I added new option to latency test code to allow using SCHED_FIFO, but the results are not good, too. It seems that the scheduler penalty for the busy-loop programs is high in the current kernels and the scheduler does not use "free" CPUs rather than interrupt this busy task. For reference (pipewire discussion): https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/1271 |
On the platforms I am currently testing if I change the latency application and skip the setscheduler() call at all, the latency app works fine on kernel >= 5.10.x. (I mean even without setting kernel.sched_rt_runtime_us param). With a kernel <= 5.9.x I have the opposite behaviour and if I skip the setscheduler() call the latency test fails. |
Wow, I did not expect that. It seems that SCHED_OTHER is better than FIFO/RR with the current kernels. I added the SCHED_OTHER policy support to latency.c - |
it sounds ok to me. |
In addition to allow the user to specify the scheduler policy, can the latency application apply a default scheduler according to the above findings ? |
I think that it would be bad to change the application defaults according the kernel version. Also, it's clearly the kernel scheduler problem which should be reported to LKML (linux kernel mailing list) and scheduler maintainers - see https://github.com/torvalds/linux/blob/c3eb11fbb826879be773c137f281569efce67aa8/MAINTAINERS#L18348 . |
If I try the latency test application (test/latency.c) using a Linux Kernel 5.9.x with alsa-lib v1.2.4 on the following audio card:
PCH [HDA Intel PCH], device 1: ALC269VC Analog [ALC269VC Analog] (for example)
I get the following result:
The tests succeeded and final state is RUNNING
If I try the same tests on the same audio device and distribution but with a Linux Kernel >= 5.10.x it fails systematically and I get:
I get an XRUN during the test execution.
I can reproduce the same error on multiple boards for multiple audio cards and on different Linux distribution just changing the Linux kernel version.
The problem was originally reported at:
AES67 Linux daemon: Latency test fails with Linux kernel starting from 5.10.0x
I am interested in testing with low latency (-m 128 -M 128).
My guess is that something changed in the Linux audio core and the latency application has to be adapted.
Any idea ?
The text was updated successfully, but these errors were encountered: