-
Notifications
You must be signed in to change notification settings - Fork 0
/
todo.txt
63 lines (53 loc) · 2.48 KB
/
todo.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
# TODO:
- Wake word voice activation
- Drop down for TTS voice choices
- Clean UI (Text at bottom, better icons and stylizing)
- Hide API key✔️:check / Allow user to input their API key
- Seperate functions for Image Chat and just text calls
- chunk audio over 30 seconds for whisper
- Whole chat context for conversation
- UI similar to my vercel app / chatgpt
- a settings widget in bottom corner to change voice / and display settings
- Error Feedback to Users:
- input validation
- preformance optimization
- payments
- authentication
- Streaming TTS and Streaming Text
- Window Selection for live and ability to exit the screen share.
- Audio source detection / choose source
- Rename variables / button and delete unused code.
Ignore this Image
Main task for 11/14:
Get message history working so old messages from api are included in context✔️
Option to clear the message history to start fresh chat✔️
Streaming text completion? ✔️
Main tasks for 11/15:
Easier:
Get the response UI formatted properly, input at bottom like chat apps ✔️X sorted
Challenge:
Get the streaming TTS api working, this will be quite involved as it will require passing chunks✔️
of the audio to the app similar to the text ✔️ still funky
bonus:
get a window detection drop down working
user selects window to stream
ability to stop stream
Main tasks for 11/16:
Get TTS Pause, Skip, Start Stop working✔️
Also prepare self for big react refactor with vercel chat-ai template as inspiration
Main Tasks for 11/17:
Lets get window streaming working✔️
Select window to stream when live:✔️
send window along with prompt to image api✔️
ability to stop the stream and go back to normal prompts✔️
bonus:
settings functions for api key, window settings, and voice actor settings
We have to fix API call logic if no image is included we can't use the vision api
i think its more expensive and their is a small daily limit
Idea:
Set a live window > calls be seconds > and just return an image call on the interval:
basically lets you 'listen' to your frame/window instead of watching it
BUGS: Seems like the Live function fights with the snip tool, might have to handle if snip ignore live or something
TTS only seems to work when I use the record feature look into this
Capture is still working after toggled off
Snip tool breaks the capture function it works in isolation but breaks snip