Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added speech to text capability #275

Merged
merged 3 commits into from
Dec 4, 2024

Conversation

navyseal4000
Copy link

Verify your system default microphone is the one you're testing with, as that's the primary limitation of this initial text to speech implementation for voice prompting.

@chrismahoney chrismahoney added the enhancement New feature or request label Nov 14, 2024
@chrismahoney
Copy link
Collaborator

Awesome! Once we settle in on some provider work this is on my radar for when we've got room for feature adds. 👍

@wonderwhy-er
Copy link
Collaborator

I actually pulled it and merged with provider work.
There are conflicts but minimal ones, it can go in parallel.

Tested here and seems to work.
There is one thing I would fix before merging
https://www.youtube.com/watch?v=3Gc0yOgx-EQ

When user submits we should clean out ongoing text so that when he speaks next time its new text.

There are some other potential UX changes.
I would love to play with allowing it to do commands.
Aka "stop/start/submit" so I can control it with my voice only.

Use "Hello Google/Alexa" style wake up and go to sleep commands?

Ugh so exciting, thanks for your great work @navyseal4000

@milutinke
Copy link

milutinke commented Nov 14, 2024

Oof, I started working on #281 before this was created.

@milutinke
Copy link

milutinke commented Nov 14, 2024

@wonderwhy-er Can we combine the two so it's not time wasted?
Maybe use this when available and mine when the browser doesn't support this.

@wonderwhy-er
Copy link
Collaborator

Oof, I started working on #281 before this was created.

that is why I am for posting PRs early in draft mode for early feedback.
I shared in the group that my policy is to review ones who added PR earlier first.

@navyseal4000
Copy link
Author

Won't have time this morning to get the enhancements done, I'll try to get to them later tonight

@navyseal4000
Copy link
Author

In fact if you'd like, feel free to finish this @milutinke and if you can, feel free to take the author role. Just lmk if you do so we don't both work on it tonight

@milutinke
Copy link

milutinke commented Nov 14, 2024

In fact if you'd like, feel free to finish this @milutinke and if you can, feel free to take the author role. Just lmk if you do so we don't both work on it tonight

I had quite a busy day, sorry for the late reply, I won't have the time to do anything at least up to 22. of November.
Wrote a proposal to Eduard in my PR, waiting for his reply, but I'd say you can finish this, and if he agrees, I would pull your changes and then add my option as the fallback. Great job btw.

Edit:
PS: Maybe add a better indicator when the user is recording, just to make clear to them, maybe even like grab my code and adapt it to use that animation when speaking.

@chrismahoney
Copy link
Collaborator

chrismahoney commented Nov 15, 2024

#Voice support is an awesome feature to support, I’m really biased toward it from a human computer interface perspective so just full disclosure.

If we can all work together towards integration of this and #281 I am more than happy to help out where I can. Cheers!

Quick Edit: I’ll qualify this by saying this feature may end up on the roadmap in a certain priority, so please don’t see that as a disappointment; just making sure all the wheels roll in the same direction. 🤓

@wonderwhy-er
Copy link
Collaborator

wonderwhy-er commented Nov 15, 2024

I am pretty excited for this feature and already tested @milutinke solution that it works pretty well with one exception.

I am honestly willing to merge this and then add those improvements as separate PRs so that ball is rolling.

May be will get to this in the evening

And we can add other things in separate PRs

@navyseal4000
Copy link
Author

@wonderwhy-er Ready for retesting. I got the fix working, but not the additional features with voice mode yet. Adding "enhance voice mode" as a goal in the readme might be a good idea, but I'll leave that determination up to you.

This reverts the constant change for the default model in commit a896f3f.
@milutinke
Copy link

milutinke commented Dec 1, 2024

@wonderwhy-er Regarding the discussion in the #281, we can get this merged first, since @navyseal4000 continued working on it.
Then I would add the https://huggingface.co/spaces/Xenova/whisper-web as the fallback, in a separate PR.

@wonderwhy-er
Copy link
Collaborator

Looking at it :)

@wonderwhy-er
Copy link
Collaborator

wonderwhy-er commented Dec 2, 2024

Just took a look, I see you are stopping microphone on submit now.
I guess that is fine too, I do want to add "non stop audio mode" later :)

I was away from computer for this weekend.
So could not test.

I am ready to merge, but we need to resolve conflict first.

I did it locally but can't push into your PR.

@wonderwhy-er wonderwhy-er merged commit f6a9861 into stackblitz-labs:main Dec 4, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants