Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Live streaming support (or, at least, something like it) #179

Open
lachjames opened this issue Feb 28, 2023 · 2 comments
Open

Live streaming support (or, at least, something like it) #179

lachjames opened this issue Feb 28, 2023 · 2 comments

Comments

@lachjames
Copy link

lachjames commented Feb 28, 2023

Hi :) I'm working on some integrations with Discord for a D&D 5e game I'm running, and I think Craig is really great for what I need (streaming audio from each user separately). I really need the information in real-time, which Craig sort of supports, but the issue is more on the server side than client side due to the implementation I've come up with:

  1. Poll at intervals (e.g. every 30 seconds), pulling the latest audio from Craig
  2. Transcribe the audio using OpenAI Whisper, for each speaker
  3. Collate the audio files (with timestamps) into a transcript, and push it to a Discord webhook

The issue, of course, is that I currently have no choice but to pull all the audio at every iteration, including audio I've already processed. I can alleviate this somewhat by making a simple Discord bot that disconnects and reconnects Craig every x minutes (e.g. every 30 minutes), but it's still an unnecessary drain. I've not looked too deeply into the code, as I figured (correctly, it turns out) it would be simpler to just copy the API calls from the website, so that's my perspective here; with that said, I think a reasonable solution would be: when making the POST request to https://craig.horse/api/recording/{id}/cook?key={key}, support an additional argument: startTime (and endTime, why not?). Then just pull audio which was created within that time period (or before/after that time, if only one of the fields is given). I think this would be a relatively straightforward solution to implement on the server-side, since it's just a case of pruning the audio files at some point before sending them (ideally as early as possible, to avoid

A more rigorous (but complex) option would be to support streaming more directly - that'd be really cool, but (I imagine) a lot more work too.

One other option I'm looking into is creating a simple version of Craig which only captures the audio, which I can then run on my own computer. This would bypass all the issues with transfer and compute, but if this is something other people might want to be able to do with Craig, maybe it's worth building it as a feature?

@lachjames
Copy link
Author

I looked a bit further into this and I think it's a relatively straightforward change to implement, with a couple of edits:

cook.sh [Line 43]

CONTAINER=zip
[ "$1" ] && CONTAINER="$1"
shift

STARTTIME="00:00:00"
["$1"] && STARTTIME="$1"
shift

cook.sh [Line 66-ish]

case "$FORMAT" in
    stream)
        ext=wav
        ENCODE="ffmpeg -f wav -i - -c:a adpcm_ms -f wav -ss $STARTTIME -"
        CONTAINER=zip
        ZIPFLAGS=-9
        ;;

craig/apps/download/api/src/util/cook.ts [Line 134]

export async function cook(id: string, format = 'flac', container = 'zip', dynaudnorm = false, startTime = "00:00:00") {
  const [state, writeState, deleteState] = stateManager(id);

  try {
    await writeState({ message: 'Starting...' });
    const cookingPath = path.join(cookPath, '..', 'cook.sh');
    const args = [id, format, container, ...(dynaudnorm ? ['dynaudnorm'] : [], startTime)];
    const child = spawn(cookingPath, args, { detached: true });
    console.log(`Cooking ${id} (${format}.${container}${dynaudnorm ? ' dynaudnorm' : ''}, from {startTime}) with process ${child.pid}`);
    registerProcess(child, deleteState);

    // Prevent the stream from ending prematurely (for some reason)
    child.stderr.on('data', getStderrReader(state, writeState));

    return child.stdout;
  } catch (e) {
    deleteState();
    throw e;
  }
}

I'm not on my dev machine today so I don't have Docker at hand, otherwise I'd give it a test run now - I'll give it a shot later and see if it breaks (or, more likely, what breaks...)

@Snazzah
Copy link
Member

Snazzah commented Jun 8, 2023

Don't know about altering how cooking works to do live transcription, but if there's a good way to do that within the bot with whisper that may be a good feature to have

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants