-
Notifications
You must be signed in to change notification settings - Fork 806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Florence 2? #815
Comments
Hey! 👋 This is something I'm working on! :) |
ONNX weights ✅ https://huggingface.co/onnx-community/Florence-2-base-ft |
Can this be slimmed🫣
|
@inisis that's right! Already slimmed :) |
@xenova so is onnxslim ready to be merged. ^-^ |
@inisis Soon! 🚀 I'm still testing across the set of ~1000 Transformers.js models (link) to find issues like inisis/OnnxSlim#10, and it will be merged into the v3 branch soon! |
@xenova btw, if all tests finished, can onnxslim be merged into optimum 🚀 |
@inisis I think that's a great idea! Feel free to open a feature request on that repo and I'll voice my support there 😎 |
@xenova I believe that you are a member of huggingface, can you have me 😎 |
@flatsiedatsie I got it working! :) Available in dev/v3 branch: #545 (comment) |
Ah whoops I've updated that in my local branch but forgot to push. I've pushed and you can try again now. |
Ah cool. I had also just fixed it :-D
|
Great! 🥳
You might need to use one of their pre-selected prompts: https://huggingface.co/microsoft/Florence-2-base-ft/blob/e7a5acc73559546de6e12ec0319cd7cc1fa2437c/processing_florence2.py#L115-L117
I've also uploaded the larger (800M) models: https://huggingface.co/onnx-community/Florence-2-large-ft or https://huggingface.co/onnx-community/Florence-2-large, which you can try out. If you do, I recommend selecting different quantizations with something like: const model = await Florence2ForConditionalGeneration.from_pretrained(model_id, {
dtype: {
embed_tokens: 'fp16',
vision_encoder: 'fp32',
encoder_model: 'fp16',
decoder_model_merged: 'q4',
},
}); (you may need to mix and match these values; selecting from "fp32", "fp16", "q8", "q4") |
I'm finding that the larger models are hit or miss.
Does this list of captions mean that the model isn't designed for free-form question asking? it sure seems like it: |
I think with the WebGPU support this issue can be closed. Awesome stuff, thank you so much for your amazing work as always. I've implemented the basic CPU version in my project, but am keeping Moondream2 as the default for now since users might otherwise get confused at the response quality when they question the image with their custom prompts. But for mass-describing images I would certainly pick Florence 2 now. |
i need to export my own custom florence2 model. how can I do it? |
hi, @xenova I encounter some difficulty to export onnx. Could you kindly share your method on how to export onnx for florence2? thank you. |
I'm running into the following error when trying the demo worker-Bo2tVEHN.js:5 Uncaught (in promise) Error: no available backend found. ERR: [webgpu] TypeError: e.requestAdapterInfo is not a function |
This PR updates WebGPU examples, and replaces their old package @xenova/transformers with @hugginface/transformers. This will fix some breaking changes in WebGPU spec, such that GPUAdapter.requestAdapterInfo() is no longer supported in the latest Chrome browser (An issue is reported at huggingface#815 (comment)). The folder names for WebGPU examples are also unified to start with "webgpu".
This PR updates WebGPU examples, and replaces their old package @xenova/transformers with @hugginface/transformers. This will fix some breaking changes in WebGPU spec, such that GPUAdapter.requestAdapterInfo() is no longer supported in the latest Chrome browser (An issue is reported at huggingface#815 (comment)). The folder names for WebGPU examples are also unified to start with "webgpu".
@JohnRSim There is a breaking change in WebGPU spec and Chrome impl, and GPUAdapter.requestAdapterInfo() is no longer supported. This was fixed in a recent onnxruntime-web, so we need to upgrade the example to use the latest @huggingface/transformers release, which includes the recent onnxruntime-web fix. |
Feature request
It describes images
Test on HF:
https://www.reddit.com/r/LocalLLaMA/comments/1djwf4v/try_microsofts_florence2_yourself/
https://huggingface.co/spaces/SixOpen/Florence-2-large-ft
Benchmarks:
https://www.reddit.com/r/LocalLLaMA/comments/1djhqzz/microsoft_florence2_vision_benchmarks/
Motivation
While there is already support for Moondream 2, this model is an order of magnitude much smaller, yet performs similarly.
https://www.reddit.com/r/LocalLLaMA/comments/1diz8en/microsoft_releases_florence2_vision_foundation/
This would greatly speed-up image description, making it easier to incorporate images in RAG queries.
Your contribution
I could aid in implementing a demo, and would happily integrate it in my soon to be released project.
Models
https://huggingface.co/microsoft/Florence-2-base/
https://huggingface.co/microsoft/Florence-2-large/
https://huggingface.co/microsoft/Florence-2-base-ft/
https://huggingface.co/microsoft/Florence-2-large-ft/
The text was updated successfully, but these errors were encountered: