We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there any plan to release an int4 version ? Specifically, I'm interested in the video understanding part.
The text was updated successfully, but these errors were encountered:
Yes! You can check out https://github.com/mobiusml/hqq/blob/master/examples/hf/aria_multimodal.py, a HQQ 4-bit version. This implementation is about 4-6x faster and takes 3x less VRAM!
Sorry, something went wrong.
I was looking into this as I need a local Video Vision solution - is it correct to assume the model was 12 shards at ~4gb a shard?
Also, you mentioned 3x less VRAM - what are the upper limits of the video vision you can achieve on 24gb or less?
EDIT: I tested locally with a 4090. Single image inference was fine but any video OOMed me no matter the length or resolution
No branches or pull requests
Is there any plan to release an int4 version ? Specifically, I'm interested in the video understanding part.
The text was updated successfully, but these errors were encountered: