Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Florence 2? #815

Closed
flatsiedatsie opened this issue Jun 20, 2024 · 22 comments
Closed

Add support for Florence 2? #815

flatsiedatsie opened this issue Jun 20, 2024 · 22 comments
Labels
enhancement New feature or request

Comments

@flatsiedatsie
Copy link
Contributor

flatsiedatsie commented Jun 20, 2024

Feature request

It describes images

Test on HF:
https://www.reddit.com/r/LocalLLaMA/comments/1djwf4v/try_microsofts_florence2_yourself/
https://huggingface.co/spaces/SixOpen/Florence-2-large-ft

Benchmarks:
https://www.reddit.com/r/LocalLLaMA/comments/1djhqzz/microsoft_florence2_vision_benchmarks/

Motivation

While there is already support for Moondream 2, this model is an order of magnitude much smaller, yet performs similarly.

Wait is a 200M model beating a 80B?

Looks like it.

https://www.reddit.com/r/LocalLLaMA/comments/1diz8en/microsoft_releases_florence2_vision_foundation/

This would greatly speed-up image description, making it easier to incorporate images in RAG queries.

Your contribution

I could aid in implementing a demo, and would happily integrate it in my soon to be released project.

sneak

Models

https://huggingface.co/microsoft/Florence-2-base/
https://huggingface.co/microsoft/Florence-2-large/
https://huggingface.co/microsoft/Florence-2-base-ft/
https://huggingface.co/microsoft/Florence-2-large-ft/

@flatsiedatsie flatsiedatsie added the enhancement New feature or request label Jun 20, 2024
@xenova
Copy link
Collaborator

xenova commented Jun 20, 2024

Hey! 👋 This is something I'm working on! :)

@xenova
Copy link
Collaborator

xenova commented Jun 20, 2024

ONNX weights ✅ https://huggingface.co/onnx-community/Florence-2-base-ft
Integrating into transformers.js now

@inisis
Copy link
Contributor

inisis commented Jun 21, 2024

ONNX weights ✅ https://huggingface.co/onnx-community/Florence-2-base-ft

Integrating into transformers.js now

Can this be slimmed🫣

I think it's already slimmed one.

@xenova
Copy link
Collaborator

xenova commented Jun 21, 2024

@inisis that's right! Already slimmed :)

@inisis
Copy link
Contributor

inisis commented Jun 21, 2024

@xenova so is onnxslim ready to be merged. ^-^

@xenova
Copy link
Collaborator

xenova commented Jun 21, 2024

@inisis Soon! 🚀 I'm still testing across the set of ~1000 Transformers.js models (link) to find issues like inisis/OnnxSlim#10, and it will be merged into the v3 branch soon!

@inisis
Copy link
Contributor

inisis commented Jun 21, 2024

@xenova btw, if all tests finished, can onnxslim be merged into optimum 🚀

@xenova
Copy link
Collaborator

xenova commented Jun 21, 2024

@inisis I think that's a great idea! Feel free to open a feature request on that repo and I'll voice my support there 😎

@inisis
Copy link
Contributor

inisis commented Jun 21, 2024

@xenova I believe that you are a member of huggingface, can you have me 😎

@xenova
Copy link
Collaborator

xenova commented Jun 22, 2024

@flatsiedatsie I got it working! :) Available in dev/v3 branch: #545 (comment)

@flatsiedatsie
Copy link
Contributor Author

I've just tried implementing it.

I'm seeing an error, but will keep trying.

image_to_text_worker.js:715 IMAGE TO TEXT WORKER: caught error calling model.generate:  TypeError: Do not know how to serialize a BigInt
    at JSON.stringify (<anonymous>)
    at Function.getGeneratedNgrams (logits_process.js:370:1)
    at Function.calcBannedNgramTokens (logits_process.js:387:1)
    at Function._call (logits_process.js:401:1)
    at closure (generic.js:20:1)
    at Function._call (logits_process.js:89:1)
    at closure (generic.js:20:1)
    at Function.generate (models.js:1466:1)

*some time later

I tried to run your code example in a clean simple example, to rule out issues with my integration. But unfortunately the same error was raised:

Screenshot 2024-06-22 at 17 09 31

@xenova
Copy link
Collaborator

xenova commented Jun 22, 2024

Ah whoops I've updated that in my local branch but forgot to push. I've pushed and you can try again now.

@flatsiedatsie
Copy link
Contributor Author

Ah cool. I had also just fixed it :-D

const generatedNgram = new Map();
		let nn = 0;
        for (const ngram of ngrams) {
            const prevNgram = ngram.slice(0, ngram.length - 1);
            const prevNgramKey = nn++; //JSON.stringify(prevNgram);
            const prevNgramValue = generatedNgram.get(prevNgramKey) ?? [];
            prevNgramValue.push(ngram[ngram.length - 1]);
            generatedNgram.set(prevNgramKey, prevNgramValue);
        }
        return generatedNgram;

@flatsiedatsie
Copy link
Contributor Author

flatsiedatsie commented Jun 22, 2024

Wow, it's definitely much faster. Very nice!

The descriptions aren't as useful though? But I'm going to keep playing around with that.

Screenshot 2024-06-22 at 18 20 37

.

Odd that this prompt results in less detail :-D
Screenshot 2024-06-22 at 18 27 27

.

Moondream for comparison:
Screenshot 2024-06-22 at 18 51 49

@xenova
Copy link
Collaborator

xenova commented Jun 22, 2024

Wow, it's definitely much faster. Very nice!

Great! 🥳

The descriptions aren't as useful though? But I'm going to keep playing around with that.

You might need to use one of their pre-selected prompts: https://huggingface.co/microsoft/Florence-2-base-ft/blob/e7a5acc73559546de6e12ec0319cd7cc1fa2437c/processing_florence2.py#L115-L117

  • caption: 'What does the image describe?'
  • detailed: 'Describe in detail what is shown in the image.'
  • more detailed: 'Describe with a paragraph what is shown in the image.'

I've also uploaded the larger (800M) models: https://huggingface.co/onnx-community/Florence-2-large-ft or https://huggingface.co/onnx-community/Florence-2-large, which you can try out. If you do, I recommend selecting different quantizations with something like:

const model = await Florence2ForConditionalGeneration.from_pretrained(model_id, {
    dtype: {
        embed_tokens: 'fp16',
        vision_encoder: 'fp32',
        encoder_model: 'fp16',
        decoder_model_merged: 'q4',
    },
});

(you may need to mix and match these values; selecting from "fp32", "fp16", "q8", "q4")

@flatsiedatsie
Copy link
Contributor Author

I'll try that, thank you!

Could it be that with the new V3 the MusicGen streamer progress callback no longer works properly? I haven't tested these separately from my code though, could just be an issue with my code.

Screenshot 2024-06-22 at 21 00 40

I'm also seeing an error with nanoLlava. It's just a number:
Screenshot 2024-06-22 at 21 05 03

@flatsiedatsie
Copy link
Contributor Author

I'm finding that the larger models are hit or miss.

good:
Screenshot 2024-06-26 at 19 22 30

bad:
Screenshot 2024-06-26 at 19 45 10

- caption: 'What does the image describe?'
- detailed: 'Describe in detail what is shown in the image.'
- more detailed: 'Describe with a paragraph what is shown in the image.'

Does this list of captions mean that the model isn't designed for free-form question asking?

it sure seems like it:

good:
Screenshot 2024-06-26 at 19 55 29

bad:
Screenshot 2024-06-26 at 19 57 16

@flatsiedatsie
Copy link
Contributor Author

flatsiedatsie commented Jun 28, 2024

I think with the WebGPU support this issue can be closed. Awesome stuff, thank you so much for your amazing work as always. I've implemented the basic CPU version in my project, but am keeping Moondream2 as the default for now since users might otherwise get confused at the response quality when they question the image with their custom prompts.

But for mass-describing images I would certainly pick Florence 2 now.

@Vasanthengineer4949
Copy link

i need to export my own custom florence2 model. how can I do it?

@dragen1860
Copy link

ONNX weights ✅ https://huggingface.co/onnx-community/Florence-2-base-ft Integrating into transformers.js now

hi, @xenova I encounter some difficulty to export onnx. Could you kindly share your method on how to export onnx for florence2? thank you.

@JohnRSim
Copy link

JohnRSim commented Nov 2, 2024

I'm running into the following error when trying the demo

worker-Bo2tVEHN.js:5 Uncaught (in promise) Error: no available backend found. ERR: [webgpu] TypeError: e.requestAdapterInfo is not a function
at gs (https://xenova-florence2-webgpu.static.hf.space/assets/worker-Bo2tVEHN.js:5:1599)
at async H0.create (https://xenova-florence2-webgpu.static.hf.space/assets/worker-Bo2tVEHN.js:5:17973)
at async rm (https://xenova-florence2-webgpu.static.hf.space/assets/worker-Bo2tVEHN.js:2353:12683)
at async Zr (https://xenova-florence2-webgpu.static.hf.space/assets/worker-Bo2tVEHN.js:2373:27270)
at async Promise.all (index 0)
at async Zm.from_pretrained (https://xenova-florence2-webgpu.static.hf.space/assets/worker-Bo2tVEHN.js:2373:34092)
at async Promise.all (index 0)
at async N3 (https://xenova-florence2-webgpu.static.hf.space/assets/worker-Bo2tVEHN.js:2373:114960)

gyagp pushed a commit to gyagp/transformers.js that referenced this issue Nov 3, 2024
This PR updates WebGPU examples, and replaces their old package
@xenova/transformers with @hugginface/transformers. This will fix
some breaking changes in WebGPU spec, such that
GPUAdapter.requestAdapterInfo() is no longer supported in the latest
Chrome browser (An issue is reported at huggingface#815 (comment)).
The folder names for WebGPU examples are also unified to start with
"webgpu".
gyagp pushed a commit to gyagp/transformers.js that referenced this issue Nov 3, 2024
This PR updates WebGPU examples, and replaces their old package
@xenova/transformers with @hugginface/transformers. This will fix
some breaking changes in WebGPU spec, such that
GPUAdapter.requestAdapterInfo() is no longer supported in the latest
Chrome browser (An issue is reported at huggingface#815 (comment)).
The folder names for WebGPU examples are also unified to start with
"webgpu".
@gyagp
Copy link

gyagp commented Nov 3, 2024

@JohnRSim There is a breaking change in WebGPU spec and Chrome impl, and GPUAdapter.requestAdapterInfo() is no longer supported. This was fixed in a recent onnxruntime-web, so we need to upgrade the example to use the latest @huggingface/transformers release, which includes the recent onnxruntime-web fix.
I just uploaded a PR to fix this, and the related HF space needs to be updated accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants