-
-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Liquidsoap sporadically restarting a few seconds after a fade operation #4274
Comments
Thanks for this report, I'll have a look as soon as possible. What version of Liquidsoap are you using? There were some important bugfixes added to |
Thank you @toots , It is 2.3.0 built from opam a few days ago. I just updated the info above to note that the 3 most recent restarts are in fact associated with short jingles finishing, that are shorter than the cross fade duration. I'm not really sure how to properly use liq_cross_duration in 2.3.0, might that be a possible explanation? As a test I've added a liq_cross_duration=0.2 annotation into the playlist for all jingles and restarted, but I think that may result in crossfades into jingles not happening at the best time. I'll trade that off for stability though if need be. |
@fkane are you sure about the version? The build info says:
It should say |
Ho wait, this looks like a bug in our build system:
|
Ok sorry about that. How are you running liquidsoap? If practical, running it inside a |
I'm still using the old liquidsoap-daemon package. Yes, i can run it from screen and see if that turns up more info next time it happens. Will report back. |
So I did find the logs via journalctl for when it restarted twice this afternoon. Not much to go on, other than it shows it did in fact crash:
I haven't had any luck locating those core dumps however. Still running another instance under screen to see if it says anything more if it crashes. |
Liquidsoap output after the crash... took about 10 hours this time. Not much to go on. Something very weird is that our live server restarted at almost exactly the same time as this test server that was running independently, with a completely different playlist and even a different instance type. Both crashed at 6:52 AM.
|
Ok thanks for this! I think the next step is to run it insude a
This should capture the stacktrace from the segfault. |
OK, on it. Thank you. |
Found a little more info in the syslog of the test server; it suggests the crash is clock-related:
The main server's syslog suggests it restarted at the same time as a coincidence, as that seems to have been driven by an automated system upgrade that restarted everything. That may be the culprit there.
|
The syslog on our live server shows a previous restart was due to the same clock-related crash:
Still running the test server under gdb, I'll let you know what that turns up. Meanwhile I'm disabling auto-updates on the server to at least resolve some of the restarts. I think those updates started when we upgraded from Ubuntu 22 to 24 in order to gain compatibility with Liquidsoap 2.3.0. |
OK, it finally did crash under gdb:
Here is a backtrace:
I'll leave gdb running for a bit in case there is more info you want me to gather. Here are the log entries prior to the crash; again, it happened after a short jingle with a too-long crossfade duration. I'm starting to believe that is what's triggering it; on my live server running the same configuration, I added liq_cross_duration=0.2 to all jingle tracks, and since disabling Ubuntu's auto-updates, it has been running reliably. (It's not an ideal solution though as it results in poor transitions to jingles from the previous track)
|
I see. Thanks so much for taking the time here. I have to see what to do with our native flac support. It's never been quite stable because the C library is callback-based and this is pretty tricky to implement in OCaml. As of now, my advice would be to switch to the I'm gonna have a more detailed look at the actual crash as soon as I have some time. Again, thank you for your patience and experimentation! |
We currently handle multitrack ogg by sending an ogg/flac stream to an external Rocket Streaming Audio Server that does seem to handle it properly, along with the metadata. So far that's been the only way to send a hi-res lossless stream to Roon that preserves metadata and doesn't stop after every track. I'd hate to lose that, but I guess it doesn't really affect that many people. Thanks for the advice and for taking a look! |
Ha I see. Yes, there aren't a lot of alternatives for sending hires with metadata indeed. I'll have a look at the actual crash, maybe we can fix it. |
@fkane I just pushed another cleanup of the code around the issue you found here: https://github.com/savonet/ocaml-flac Would you be able to test? All you should have to do since you already installed via
|
Thank you! I've kicked off a test with flac 0.5.1 under gdb; I'll let you know how it does. |
Great. Finger crossed it fixes the issue. I'm pretty mad I realized I already had done work there and forgot to release it. Won't miss it this time if it works for you. |
So far so good... we should let it run for another day though before declaring victory. |
OK, it's been running for 2 days straight now without incident! And that's on a t3.medium with the samplerate quality settings bumped back up to try and stress it. The only issue worth noting is that there are occasional short gaps between tracks on the ogg/flac icecast stream. The clock has gotten pretty far behind over time:
It may be unrelated or due to the extra overhead of gdb, and it's better than crashing. I have kicked off a new test not running under gdb just to confirm performance is acceptable over time in a scenario closer to production. I think releasing ocaml-flac 0.5.1 is a good idea, though. |
Well this is awesome. I'm gonna mark this issue as resolved and we can follow-up with one specific to performance issues if that is confirmed (which I think might have b/c of Thanks again for the help and coorperation! |
Unfortunately I think I spoke too soon... even with flac 0.5.1, I just experienced two similar crashes this evening. It just took longer to hit whatever triggers it this time. I've started another gdb session on a test server to try and capture a stack trace again and see if it is indeed the same issue or not. I'll report back with what I find, but it may take a few days to reproduce. Once again it does seem associated with crossfades involving a short jingle, but it is very sporadic. I can't reproduce it by just playing the same sequence of tracks again. |
Description
Very sporadically - it could be after a few hours, it could be after a couple of days - Liquidsoap will stop and restart with no useful information in the logs as to why. The only pattern I can determine is that it usually happens shortly after a fade transition.
It seemed to start happening less frequently after reducing the quality setting for samplerate from "medium" to "fast," but it may just be a coincidence that it took longer to happen after that. We do stream in 48kHz - 24 bit from sources that range from 44.1/16 to 96/24. Source formats are flac or m4a, and jingles are wav.
We are running on an AWS EC2 instance. Upgrading from a t3.medium instance type to a t3.large (twice as powerful) didn't help.
This started happening after updating my Liquidsoap from a custom-built 2.1.x to 2.2.5 and then to 2.3.0 hoping it was a known issue that was addressed, but the problem persists in 2.3.0. Another change possibly associated with this issue was getting my metadata-driven crossfades working properly again as described in this discussion. These restarts seemed to go away while my crossfades were broken and just starting tracks sequentially instead of fading. Which makes me suspect it does have something to do with crossfades, but the restarts only seem to happen after the fade is complete, based on the timestamps in the log.
I have also observed the same behavior while testing the new autocue stuff and no fade-related annotations in the playlist. Again, it happens a few seconds following a fade transition between tracks.
Steps to reproduce
Unfortunately there is no known way to reproduce this behavior. It seems to happen at random, and very infrequently. You'd have to run for days to confirm it is fixed, unless the cause can be better isolated. I checked, and it's not the case that the same song(s) seem to be associated with the fault. Although it is possible it is triggered by different songs that have something in common.
Expected behavior
Playlists should continue through to completion before restarting.
Liquidsoap version
Liquidsoap build config
Installation method
From OPAM
Additional Info
Here is a sample segment of the log surrounding the most recent time this happened. As I mistakenly thought the issue was resolved after lowering my samplerate conversion quality settings, I was only running with the default log level... but there is no additional useful information with log level 4.
In this case it happened after a jingle, and there is a warning about the crossfade duration being longer than the track duration. I don't THINK that is the cause as I've seen this happen without that case. I've run into problems using liq_cross_duration metadata in 2.3.0, so I'm passing in a 30 second duration to cross() - which is long enough to cover my longest fades, but longer than most jingles. I have also observed this same behavior when fading between two full-length music tracks; it doesn't just happen after a jingle.
EDIT: This has since happened twice more since I filed this issue this morning, and both times it was again associated with a short jingle with a crossfade duration longer than the jingle. That seems suspicious. To rule that out, I've modified my playlist to include a liq_cross_duration of 0.2 on all jingles and restarted... I'll report back on if that has an effect.
Also here is my complete script:
Here is a sample of my playlist annotated for crossfades:
The text was updated successfully, but these errors were encountered: