Replies: 2 comments 11 replies
-
I'd like to improve real-time characterizations and "image-making" of active audio. So far I've taken a more informational than strictly auditory approach. For example, I can infer musical details of a group of musicians without hearing them, by aggregating metadata markers from their jam history. I've experimented with classification, mood detection, and generation of images based on clues about live jams. If you modify the server to stream to Essentia, I can deploy a half-dozen modified servers and then aggregate results in near-real time at https://jamulus.live to aide people in server navigation. What kinds of analysis and insights interest you from live tracks, and what do you gain from multiple discreet tracks vs. a single mixed stream of all the tracks? (A pre-mixed stream is easier to get and handle.) |
Beta Was this translation helpful? Give feedback.
-
An update on this experiment. I've given up on Essentia because it doesn't seem to support realtime streaming analysis. There is a parent/container issue raised against this topic and I'd love to see progress against it. But in the interests of moving forward I'm using Adam Stark's Gist on a stream of raw PCM frames sent/received via a Posix message queue. Gist has the benefit of being very easy to work with, but it doesn't support all the algos of Essentia, most notably it doesn't attempt to do anything related to ML. My implementation is a dirty Linux-only hack on top of CJamRecorder. I'll put up a draft PR when/if it gets tidier, however I think it would need to be rewritten in any case as cross-platform code. I don't know QT very well, but I don't think it has a wrapper over Posix mq. MQ seems to be a good fit for this problem, however. |
Beta Was this translation helpful? Give feedback.
-
I'm a visual artist (and software dev) looking at a collaboration usecase in which remote musicians send their audio to a Jamulus (linux) server. I want to access their streams to do near-realtime audio analysis using Essentia to drive a reactive image-making process.
I have seen previous discussions in the archives here about tapping off the realtime channel data-streams into external mixing solutions, but my usecase is different in that I'm not particularly sensitive to latency concerns; I'm looking to do audio analysis that only "approximately" lines up with the audio streams.
After looking through the src, I can see a possible route/hack by modifying
CJamRecorder::OnFrame
so that it writes audio frames to a domain socket rather than to the filesystem. An external process can pick up the audio data from there and stream it through Essentia to do the analysis (on each channel) that I'm trying to get to.Does it make sense to implement things this way or am I overlooking some simpler way forward?
Thanks for any pointers: I'm fairly new to audio processing but excited by the possibilities that Jamulus seems to offer, if I can just get past this hurdle.
Beta Was this translation helpful? Give feedback.
All reactions