Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Video vs Frames? #24

Open
cbasavaraj opened this issue Nov 26, 2024 · 2 comments
Open

Question: Video vs Frames? #24

cbasavaraj opened this issue Nov 26, 2024 · 2 comments

Comments

@cbasavaraj
Copy link

Hi, I'm running the code on my machine and it works fine. I notice that you are sampling 1 frame / second. So there is no smarter way of sampling frames only when needed, for example, a scene change? I have read the paper and know that such things are done within the model, but the initial sampling is just 1 frame / second?

@xiaoqian-shen
Copy link
Collaborator

xiaoqian-shen commented Nov 27, 2024

Yes, we initially sample video at 1 fps and then decide which frames to reduce based on extracted features representation.

@cbasavaraj
Copy link
Author

Thank you!

Some more questions on how the chat works: In the provided app.py with the chat interface on the browser, I am loading one video and asking multiple questions. Each turn takes about the same time for inference. So, the visual elements are processed every time, and the previous turns' intermediate outputs are not reused? Also, is the text chat history (questions and answers) reused for later queries?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants