Add support for Florence 2? #815

flatsiedatsie · 2024-06-20T12:07:45Z

Feature request

It describes images

Test on HF:
https://www.reddit.com/r/LocalLLaMA/comments/1djwf4v/try_microsofts_florence2_yourself/
https://huggingface.co/spaces/SixOpen/Florence-2-large-ft

Benchmarks:
https://www.reddit.com/r/LocalLLaMA/comments/1djhqzz/microsoft_florence2_vision_benchmarks/

Motivation

While there is already support for Moondream 2, this model is an order of magnitude much smaller, yet performs similarly.

Wait is a 200M model beating a 80B?

Looks like it.

https://www.reddit.com/r/LocalLLaMA/comments/1diz8en/microsoft_releases_florence2_vision_foundation/

This would greatly speed-up image description, making it easier to incorporate images in RAG queries.

Your contribution

I could aid in implementing a demo, and would happily integrate it in my soon to be released project.

Models

https://huggingface.co/microsoft/Florence-2-base/
https://huggingface.co/microsoft/Florence-2-large/
https://huggingface.co/microsoft/Florence-2-base-ft/
https://huggingface.co/microsoft/Florence-2-large-ft/

xenova · 2024-06-20T15:15:22Z

Hey! 👋 This is something I'm working on! :)

xenova · 2024-06-20T15:58:55Z

ONNX weights ✅ https://huggingface.co/onnx-community/Florence-2-base-ft
Integrating into transformers.js now

inisis · 2024-06-21T03:53:43Z

ONNX weights ✅ https://huggingface.co/onnx-community/Florence-2-base-ft

Integrating into transformers.js now

Can this be slimmed🫣

I think it's already slimmed one.

xenova · 2024-06-21T06:48:08Z

@inisis that's right! Already slimmed :)

inisis · 2024-06-21T07:07:40Z

@xenova so is onnxslim ready to be merged. ^-^

xenova · 2024-06-21T07:33:42Z

@inisis Soon! 🚀 I'm still testing across the set of ~1000 Transformers.js models (link) to find issues like inisis/OnnxSlim#10, and it will be merged into the v3 branch soon!

inisis · 2024-06-21T07:35:36Z

@xenova btw, if all tests finished, can onnxslim be merged into optimum 🚀

xenova · 2024-06-21T07:46:16Z

@inisis I think that's a great idea! Feel free to open a feature request on that repo and I'll voice my support there 😎

inisis · 2024-06-21T07:58:02Z

@xenova I believe that you are a member of huggingface, can you have me 😎

xenova · 2024-06-22T00:52:23Z

@flatsiedatsie I got it working! :) Available in dev/v3 branch: #545 (comment)

flatsiedatsie · 2024-06-22T15:12:02Z

I've just tried implementing it.

I'm seeing an error, but will keep trying.

image_to_text_worker.js:715 IMAGE TO TEXT WORKER: caught error calling model.generate:  TypeError: Do not know how to serialize a BigInt
    at JSON.stringify (<anonymous>)
    at Function.getGeneratedNgrams (logits_process.js:370:1)
    at Function.calcBannedNgramTokens (logits_process.js:387:1)
    at Function._call (logits_process.js:401:1)
    at closure (generic.js:20:1)
    at Function._call (logits_process.js:89:1)
    at closure (generic.js:20:1)
    at Function.generate (models.js:1466:1)

*some time later

I tried to run your code example in a clean simple example, to rule out issues with my integration. But unfortunately the same error was raised:

xenova · 2024-06-22T15:31:17Z

Ah whoops I've updated that in my local branch but forgot to push. I've pushed and you can try again now.

flatsiedatsie · 2024-06-22T16:05:03Z

Ah cool. I had also just fixed it :-D

const generatedNgram = new Map();
		let nn = 0;
        for (const ngram of ngrams) {
            const prevNgram = ngram.slice(0, ngram.length - 1);
            const prevNgramKey = nn++; //JSON.stringify(prevNgram);
            const prevNgramValue = generatedNgram.get(prevNgramKey) ?? [];
            prevNgramValue.push(ngram[ngram.length - 1]);
            generatedNgram.set(prevNgramKey, prevNgramValue);
        }
        return generatedNgram;

flatsiedatsie · 2024-06-22T16:29:15Z

Wow, it's definitely much faster. Very nice!

The descriptions aren't as useful though? But I'm going to keep playing around with that.

.

Odd that this prompt results in less detail :-D

.

Moondream for comparison:

xenova · 2024-06-22T17:12:19Z

Wow, it's definitely much faster. Very nice!

Great! 🥳

The descriptions aren't as useful though? But I'm going to keep playing around with that.

You might need to use one of their pre-selected prompts: https://huggingface.co/microsoft/Florence-2-base-ft/blob/e7a5acc73559546de6e12ec0319cd7cc1fa2437c/processing_florence2.py#L115-L117

caption: 'What does the image describe?'
detailed: 'Describe in detail what is shown in the image.'
more detailed: 'Describe with a paragraph what is shown in the image.'

I've also uploaded the larger (800M) models: https://huggingface.co/onnx-community/Florence-2-large-ft or https://huggingface.co/onnx-community/Florence-2-large, which you can try out. If you do, I recommend selecting different quantizations with something like:

const model = await Florence2ForConditionalGeneration.from_pretrained(model_id, {
    dtype: {
        embed_tokens: 'fp16',
        vision_encoder: 'fp32',
        encoder_model: 'fp16',
        decoder_model_merged: 'q4',
    },
});

(you may need to mix and match these values; selecting from "fp32", "fp16", "q8", "q4")

flatsiedatsie · 2024-06-22T19:05:52Z

I'll try that, thank you!

Could it be that with the new V3 the MusicGen streamer progress callback no longer works properly? I haven't tested these separately from my code though, could just be an issue with my code.

I'm also seeing an error with nanoLlava. It's just a number:

flatsiedatsie · 2024-06-26T17:58:52Z

I'm finding that the larger models are hit or miss.

good:

bad:

- caption: 'What does the image describe?'
- detailed: 'Describe in detail what is shown in the image.'
- more detailed: 'Describe with a paragraph what is shown in the image.'

Does this list of captions mean that the model isn't designed for free-form question asking?

it sure seems like it:

good:

bad:

flatsiedatsie · 2024-06-28T09:39:53Z

I think with the WebGPU support this issue can be closed. Awesome stuff, thank you so much for your amazing work as always. I've implemented the basic CPU version in my project, but am keeping Moondream2 as the default for now since users might otherwise get confused at the response quality when they question the image with their custom prompts.

But for mass-describing images I would certainly pick Florence 2 now.

Vasanthengineer4949 · 2024-07-09T10:02:46Z

i need to export my own custom florence2 model. how can I do it?

dragen1860 · 2024-07-25T03:04:29Z

ONNX weights ✅ https://huggingface.co/onnx-community/Florence-2-base-ft Integrating into transformers.js now

hi, @xenova I encounter some difficulty to export onnx. Could you kindly share your method on how to export onnx for florence2? thank you.

JohnRSim · 2024-11-02T20:15:06Z

I'm running into the following error when trying the demo

worker-Bo2tVEHN.js:5 Uncaught (in promise) Error: no available backend found. ERR: [webgpu] TypeError: e.requestAdapterInfo is not a function
at gs (https://xenova-florence2-webgpu.static.hf.space/assets/worker-Bo2tVEHN.js:5:1599)
at async H0.create (https://xenova-florence2-webgpu.static.hf.space/assets/worker-Bo2tVEHN.js:5:17973)
at async rm (https://xenova-florence2-webgpu.static.hf.space/assets/worker-Bo2tVEHN.js:2353:12683)
at async Zr (https://xenova-florence2-webgpu.static.hf.space/assets/worker-Bo2tVEHN.js:2373:27270)
at async Promise.all (index 0)
at async Zm.from_pretrained (https://xenova-florence2-webgpu.static.hf.space/assets/worker-Bo2tVEHN.js:2373:34092)
at async Promise.all (index 0)
at async N3 (https://xenova-florence2-webgpu.static.hf.space/assets/worker-Bo2tVEHN.js:2373:114960)

This PR updates WebGPU examples, and replaces their old package @xenova/transformers with @hugginface/transformers. This will fix some breaking changes in WebGPU spec, such that GPUAdapter.requestAdapterInfo() is no longer supported in the latest Chrome browser (An issue is reported at huggingface#815 (comment)). The folder names for WebGPU examples are also unified to start with "webgpu".

gyagp · 2024-11-03T04:08:55Z

@JohnRSim There is a breaking change in WebGPU spec and Chrome impl, and GPUAdapter.requestAdapterInfo() is no longer supported. This was fixed in a recent onnxruntime-web, so we need to upgrade the example to use the latest @huggingface/transformers release, which includes the recent onnxruntime-web fix.
I just uploaded a PR to fix this, and the related HF space needs to be updated accordingly.

flatsiedatsie added the enhancement New feature or request label Jun 20, 2024

xenova mentioned this issue Jun 22, 2024

🚀🚀🚀 Transformers.js V3 🚀🚀🚀 #545

Merged

13 tasks

flatsiedatsie closed this as completed Jun 28, 2024

gyagp mentioned this issue Nov 3, 2024

Update WebGPU examples with the latest @huggingface/transformers package #1009

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Florence 2? #815

Add support for Florence 2? #815

flatsiedatsie commented Jun 20, 2024 •

edited

Loading

xenova commented Jun 20, 2024

xenova commented Jun 20, 2024

inisis commented Jun 21, 2024 •

edited

Loading

xenova commented Jun 21, 2024

inisis commented Jun 21, 2024

xenova commented Jun 21, 2024 •

edited

Loading

inisis commented Jun 21, 2024

xenova commented Jun 21, 2024

inisis commented Jun 21, 2024

xenova commented Jun 22, 2024 •

edited

Loading

flatsiedatsie commented Jun 22, 2024

xenova commented Jun 22, 2024

flatsiedatsie commented Jun 22, 2024

flatsiedatsie commented Jun 22, 2024 •

edited

Loading

xenova commented Jun 22, 2024 •

edited

Loading

flatsiedatsie commented Jun 22, 2024

flatsiedatsie commented Jun 26, 2024

flatsiedatsie commented Jun 28, 2024 •

edited

Loading

Vasanthengineer4949 commented Jul 9, 2024

dragen1860 commented Jul 25, 2024

JohnRSim commented Nov 2, 2024 •

edited

Loading

gyagp commented Nov 3, 2024

Add support for Florence 2? #815

Add support for Florence 2? #815

Comments

flatsiedatsie commented Jun 20, 2024 • edited Loading

Feature request

Motivation

Your contribution

xenova commented Jun 20, 2024

xenova commented Jun 20, 2024

inisis commented Jun 21, 2024 • edited Loading

xenova commented Jun 21, 2024

inisis commented Jun 21, 2024

xenova commented Jun 21, 2024 • edited Loading

inisis commented Jun 21, 2024

xenova commented Jun 21, 2024

inisis commented Jun 21, 2024

xenova commented Jun 22, 2024 • edited Loading

flatsiedatsie commented Jun 22, 2024

xenova commented Jun 22, 2024

flatsiedatsie commented Jun 22, 2024

flatsiedatsie commented Jun 22, 2024 • edited Loading

xenova commented Jun 22, 2024 • edited Loading

flatsiedatsie commented Jun 22, 2024

flatsiedatsie commented Jun 26, 2024

flatsiedatsie commented Jun 28, 2024 • edited Loading

Vasanthengineer4949 commented Jul 9, 2024

dragen1860 commented Jul 25, 2024

JohnRSim commented Nov 2, 2024 • edited Loading

gyagp commented Nov 3, 2024

flatsiedatsie commented Jun 20, 2024 •

edited

Loading

inisis commented Jun 21, 2024 •

edited

Loading

xenova commented Jun 21, 2024 •

edited

Loading

xenova commented Jun 22, 2024 •

edited

Loading

flatsiedatsie commented Jun 22, 2024 •

edited

Loading

xenova commented Jun 22, 2024 •

edited

Loading

flatsiedatsie commented Jun 28, 2024 •

edited

Loading

JohnRSim commented Nov 2, 2024 •

edited

Loading