You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm running the code on my machine and it works fine. I notice that you are sampling 1 frame / second. So there is no smarter way of sampling frames only when needed, for example, a scene change? I have read the paper and know that such things are done within the model, but the initial sampling is just 1 frame / second?
The text was updated successfully, but these errors were encountered:
Some more questions on how the chat works: In the provided app.py with the chat interface on the browser, I am loading one video and asking multiple questions. Each turn takes about the same time for inference. So, the visual elements are processed every time, and the previous turns' intermediate outputs are not reused? Also, is the text chat history (questions and answers) reused for later queries?
Hi, I'm running the code on my machine and it works fine. I notice that you are sampling 1 frame / second. So there is no smarter way of sampling frames only when needed, for example, a scene change? I have read the paper and know that such things are done within the model, but the initial sampling is just 1 frame / second?
The text was updated successfully, but these errors were encountered: