Int4 inference #45

dmenig · 2024-10-30T15:58:34Z

Is there any plan to release an int4 version ? Specifically, I'm interested in the video understanding part.

xffxff · 2024-11-12T02:52:42Z

Yes! You can check out https://github.com/mobiusml/hqq/blob/master/examples/hf/aria_multimodal.py, a HQQ 4-bit version. This implementation is about 4-6x faster and takes 3x less VRAM!

AIWarper · 2024-12-05T20:18:50Z

Yes! You can check out https://github.com/mobiusml/hqq/blob/master/examples/hf/aria_multimodal.py, a HQQ 4-bit version. This implementation is about 4-6x faster and takes 3x less VRAM!

I was looking into this as I need a local Video Vision solution - is it correct to assume the model was 12 shards at ~4gb a shard?

Also, you mentioned 3x less VRAM - what are the upper limits of the video vision you can achieve on 24gb or less?

EDIT: I tested locally with a 4090. Single image inference was fine but any video OOMed me no matter the length or resolution

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Int4 inference #45

Int4 inference #45

dmenig commented Oct 30, 2024

xffxff commented Nov 12, 2024

AIWarper commented Dec 5, 2024 •

edited

Loading

Int4 inference #45

Int4 inference #45

Comments

dmenig commented Oct 30, 2024

xffxff commented Nov 12, 2024

AIWarper commented Dec 5, 2024 • edited Loading

AIWarper commented Dec 5, 2024 •

edited

Loading