You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi :) I'm working on some integrations with Discord for a D&D 5e game I'm running, and I think Craig is really great for what I need (streaming audio from each user separately). I really need the information in real-time, which Craig sort of supports, but the issue is more on the server side than client side due to the implementation I've come up with:
Poll at intervals (e.g. every 30 seconds), pulling the latest audio from Craig
Transcribe the audio using OpenAI Whisper, for each speaker
Collate the audio files (with timestamps) into a transcript, and push it to a Discord webhook
The issue, of course, is that I currently have no choice but to pull all the audio at every iteration, including audio I've already processed. I can alleviate this somewhat by making a simple Discord bot that disconnects and reconnects Craig every x minutes (e.g. every 30 minutes), but it's still an unnecessary drain. I've not looked too deeply into the code, as I figured (correctly, it turns out) it would be simpler to just copy the API calls from the website, so that's my perspective here; with that said, I think a reasonable solution would be: when making the POST request to https://craig.horse/api/recording/{id}/cook?key={key}, support an additional argument: startTime (and endTime, why not?). Then just pull audio which was created within that time period (or before/after that time, if only one of the fields is given). I think this would be a relatively straightforward solution to implement on the server-side, since it's just a case of pruning the audio files at some point before sending them (ideally as early as possible, to avoid
A more rigorous (but complex) option would be to support streaming more directly - that'd be really cool, but (I imagine) a lot more work too.
One other option I'm looking into is creating a simple version of Craig which only captures the audio, which I can then run on my own computer. This would bypass all the issues with transfer and compute, but if this is something other people might want to be able to do with Craig, maybe it's worth building it as a feature?
The text was updated successfully, but these errors were encountered:
exportasyncfunctioncook(id: string,format='flac',container='zip',dynaudnorm=false,startTime="00:00:00"){const[state,writeState,deleteState]=stateManager(id);try{awaitwriteState({message: 'Starting...'});constcookingPath=path.join(cookPath,'..','cook.sh');constargs=[id,format,container, ...(dynaudnorm ? ['dynaudnorm'] : [],startTime)];constchild=spawn(cookingPath,args,{detached: true});console.log(`Cooking ${id} (${format}.${container}${dynaudnorm ? ' dynaudnorm' : ''}, from {startTime}) with process ${child.pid}`);registerProcess(child,deleteState);// Prevent the stream from ending prematurely (for some reason)child.stderr.on('data',getStderrReader(state,writeState));returnchild.stdout;}catch(e){deleteState();throwe;}}
I'm not on my dev machine today so I don't have Docker at hand, otherwise I'd give it a test run now - I'll give it a shot later and see if it breaks (or, more likely, what breaks...)
Don't know about altering how cooking works to do live transcription, but if there's a good way to do that within the bot with whisper that may be a good feature to have
Hi :) I'm working on some integrations with Discord for a D&D 5e game I'm running, and I think Craig is really great for what I need (streaming audio from each user separately). I really need the information in real-time, which Craig sort of supports, but the issue is more on the server side than client side due to the implementation I've come up with:
The issue, of course, is that I currently have no choice but to pull all the audio at every iteration, including audio I've already processed. I can alleviate this somewhat by making a simple Discord bot that disconnects and reconnects Craig every x minutes (e.g. every 30 minutes), but it's still an unnecessary drain. I've not looked too deeply into the code, as I figured (correctly, it turns out) it would be simpler to just copy the API calls from the website, so that's my perspective here; with that said, I think a reasonable solution would be: when making the POST request to
https://craig.horse/api/recording/{id}/cook?key={key}
, support an additional argument:startTime
(andendTime
, why not?). Then just pull audio which was created within that time period (or before/after that time, if only one of the fields is given). I think this would be a relatively straightforward solution to implement on the server-side, since it's just a case of pruning the audio files at some point before sending them (ideally as early as possible, to avoidA more rigorous (but complex) option would be to support streaming more directly - that'd be really cool, but (I imagine) a lot more work too.
One other option I'm looking into is creating a simple version of Craig which only captures the audio, which I can then run on my own computer. This would bypass all the issues with transfer and compute, but if this is something other people might want to be able to do with Craig, maybe it's worth building it as a feature?
The text was updated successfully, but these errors were encountered: