Near-realtime audio analysis of all channels #3239

smeyfroi · 2024-02-25T21:52:43Z

smeyfroi
Feb 25, 2024

I'm a visual artist (and software dev) looking at a collaboration usecase in which remote musicians send their audio to a Jamulus (linux) server. I want to access their streams to do near-realtime audio analysis using Essentia to drive a reactive image-making process.

I have seen previous discussions in the archives here about tapping off the realtime channel data-streams into external mixing solutions, but my usecase is different in that I'm not particularly sensitive to latency concerns; I'm looking to do audio analysis that only "approximately" lines up with the audio streams.

After looking through the src, I can see a possible route/hack by modifying CJamRecorder::OnFrame so that it writes audio frames to a domain socket rather than to the filesystem. An external process can pick up the audio data from there and stream it through Essentia to do the analysis (on each channel) that I'm trying to get to.

Does it make sense to implement things this way or am I overlooking some simpler way forward?

Thanks for any pointers: I'm fairly new to audio processing but excited by the possibilities that Jamulus seems to offer, if I can just get past this hurdle.

mcfnord · 2024-02-26T11:31:17Z

mcfnord
Feb 26, 2024

I'd like to improve real-time characterizations and "image-making" of active audio. So far I've taken a more informational than strictly auditory approach. For example, I can infer musical details of a group of musicians without hearing them, by aggregating metadata markers from their jam history. I've experimented with classification, mood detection, and generation of images based on clues about live jams. If you modify the server to stream to Essentia, I can deploy a half-dozen modified servers and then aggregate results in near-real time at https://jamulus.live to aide people in server navigation. What kinds of analysis and insights interest you from live tracks, and what do you gain from multiple discreet tracks vs. a single mixed stream of all the tracks? (A pre-mixed stream is easier to get and handle.)

6 replies

mcfnord Feb 26, 2024

I would like offering visuals like that to current participants and listeners!

smeyfroi Feb 26, 2024
Author

This seems like the only hurdle, so I'm hoping it's an approach that might work... though I don't know this codebase at all.

I guess I should try out my hack and see what happens, and then worry about whether it can be made into something "official" later on down the line.

mcfnord Feb 27, 2024

I try the easiest thing that could possibly work.

I am not that good at software-based audio programming. But have you considered the integration options with recording apps like Reaper, or Jack? I think both of those might handle multi-track live audio, and also think they might have APIs to real-time activity.

smeyfroi Feb 27, 2024
Author

Mmmm, you might be right, but it sounds less direct? That world of audio is a rabbit-hole I haven't fallen into, so I could be entirely wrong. :-)

I've made an ubuntu docker image that builds and runs jamulus server successfully, so now my plan is to make a replacement JamRecorder that broadcasts all the streams to a unix domain socket. The analysis process can listen for audio from there and do its thing, entirely separated from the Jamulus process.

It seems fairly straightforward, though slow going while I relearn this kind of system-level dev.

My thinking is that if it's generally useful outside of my fork, this could be turned into an alternative to the --recording option for those who want to use the pre-mix streams for whatever reason.

I wonder whether the current JamRecorder points to a potential plugin point for Jamulus, but I don't really know enough to say.

smeyfroi Feb 27, 2024
Author

For example a slightly different version of the recorder/streamer could send out mixed audio for a local Essentia-based process to do the ML work you suggested earlier. Instead of sending it over the network as it does to the normal client apps. Perhaps our use-cases are more closely aligned than I thought.

smeyfroi · 2024-03-02T21:42:40Z

smeyfroi
Mar 2, 2024
Author

An update on this experiment.

I've given up on Essentia because it doesn't seem to support realtime streaming analysis. There is a parent/container issue raised against this topic and I'd love to see progress against it.

But in the interests of moving forward I'm using Adam Stark's Gist on a stream of raw PCM frames sent/received via a Posix message queue. Gist has the benefit of being very easy to work with, but it doesn't support all the algos of Essentia, most notably it doesn't attempt to do anything related to ML.

My implementation is a dirty Linux-only hack on top of CJamRecorder. I'll put up a draft PR when/if it gets tidier, however I think it would need to be rewritten in any case as cross-platform code. I don't know QT very well, but I don't think it has a wrapper over Posix mq. MQ seems to be a good fit for this problem, however.

5 replies

smeyfroi Mar 13, 2024
Author

OK, another update: I've got a thing working end to end now.

Blue players use unmodified Jamulus client app connected to my modified pink Jamulus server. It's modified to replace CJamRecorder with a simple pass through of unmixed PCM frames thru a Posix MQ into a separate pink "Analyser" process on the same hardware. Analyser uses Gist and fires out OSC messages with the results over UDP to my orange laptop where the drawing machine(s) live.

It's a big dirty linux-only hack. My simplistic/naive code is here. You'll notice I'm mashing all the incoming frames together into one muddled up stream so far... will fix in due course.

What have I learnt?

For my use case (and perhaps others) it's interesting to have access to the incoming audio streams for further processing. In my case I want to make reactive drawings from those streams in near-realtime
CJamRecorder is somewhere that a plugin point might live for Jamulus to expose the audio streams for use cases like mine.
CJamRecorder is a fairly hard-coded class that knows about a couple of audio file formats, so perhaps exposing it as a plugin point might be useful for extending the file formats it can write?

As far as my fork goes, it does what I need and I'm not in a position to make it cross-platform or clean, particularly as I've never used QT and I don't have access to Windows. But maybe it's an interesting experiment for someone else out there. Also, I have to say that the Jamulus code has been a pleasure to work with.

mcfnord Mar 18, 2024

Very interesting so far! It's fine if it's Linux-only, because the best servers are nearly always Linux-based cloud hosts. I run 3 right now. Would very happily offer a browser-based visualization on them! I still wonder if Jack might be an easier coding scenario. Please keep us posted!

smeyfroi Mar 18, 2024
Author

Thanks!

I will take a look at Jack: I've come across it in my learning curve on audio so i need to understand where it fits in. Thanks for the pointer!

Browser based vis is also interesting. The scenario I'm currently working with my musicians is that they see any vis via zoom etc, so they can react as they play. That assumes my hardware is the thing doing the rendering, via OpenFrameworks/GLSL. I like that so far, because I have decent compute and GPUs onboard, and there's no telling what kind of machine everyone else has.

Long-term project towards a uni course I'm doing, so I'll be around, noodling and doodling.

mcfnord Mar 22, 2024

I could also see using Jitsi to provide this video. I looked into using Jitsi to pair a video component with servers. A rebuild of Jitsi without audio would make it easier and more consistently performant when paired with Jamulus. If a channel also included your visualization, that would be extra cool.

smeyfroi Mar 29, 2024
Author

OK I'm understanding your interest in this a little better now! I'm new to this world so didn't realise the platform you offer. 👍

I also see the broad direction you're pointing, which is certainly interesting in terms of marrying web-based visualisation with the music collaboration platform.

At the moment I'm developing this as an academic project, which is likely to be quite limited in technical scope. Mostly because the goal is to get a good mark for the art not the coding. 🤣 But I think you're pointing to some very interesting future development. I'll keep this conversation in mind as things develop, and thank you for pointing out some of the possibilities here. Will keep this thread updated as work progresses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jamulus

Near-realtime audio analysis of all channels #3239

{{title}}

Replies: 2 comments 11 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Jamulus

Near-realtime audio analysis of all channels #3239

smeyfroi Feb 25, 2024

Replies: 2 comments · 11 replies

mcfnord Feb 26, 2024

mcfnord Feb 26, 2024

smeyfroi Feb 26, 2024 Author

mcfnord Feb 27, 2024

smeyfroi Feb 27, 2024 Author

smeyfroi Feb 27, 2024 Author

smeyfroi Mar 2, 2024 Author

smeyfroi Mar 13, 2024 Author

mcfnord Mar 18, 2024

smeyfroi Mar 18, 2024 Author

mcfnord Mar 22, 2024

smeyfroi Mar 29, 2024 Author

smeyfroi
Feb 25, 2024

Replies: 2 comments 11 replies

mcfnord
Feb 26, 2024

smeyfroi Feb 26, 2024
Author

smeyfroi Feb 27, 2024
Author

smeyfroi Feb 27, 2024
Author

smeyfroi
Mar 2, 2024
Author

smeyfroi Mar 13, 2024
Author

smeyfroi Mar 18, 2024
Author

smeyfroi Mar 29, 2024
Author