Skip to content

Commit

Permalink
MultiModal.HuggingFaceMultiModal: fix errors and README, add stream_c…
Browse files Browse the repository at this point in the history
…omplete (#16376)

fix imports
  • Loading branch information
g-hano authored Oct 8, 2024
1 parent 0b19dea commit d5b7511
Showing 1 changed file with 36 additions and 5 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Here's a basic example of how to use the Hugging Face multimodal integration:

```python
from llama_index.multi_modal_llms.huggingface import HuggingFaceMultiModal
from llama_index.schema import ImageDocument
from llama_index.core.schema import ImageDocument

# Initialize the model
model = HuggingFaceMultiModal.from_model_name("Qwen/Qwen2-VL-2B-Instruct")
Expand All @@ -50,14 +50,45 @@ response = model.complete(prompt, image_documents=[image_document])
print(response.text)
```

### Streaming

```python
from llama_index.multi_modal_llms.huggingface import HuggingFaceMultiModal
from llama_index.core.schema import ImageDocument

# Initialize the model
model = HuggingFaceMultiModal.from_model_name("Qwen/Qwen2-VL-2B-Instruct")

# Prepare your image and prompt
image_document = ImageDocument(image_path="downloaded_image.jpg")
prompt = "Describe this image in detail."

import nest_asyncio
import asyncio

nest_asyncio.apply()


async def stream_output():
for chunk in model.stream_complete(
prompt, image_documents=[image_document]
):
print(chunk.delta, end="", flush=True)
await asyncio.sleep(0)


asyncio.run(stream_output())
```

You can also refer to this [Colab notebook](examples\huggingface_multimodal.ipynb)

## Supported Models

1. Qwen2VisionMultiModal
2. Florence2MultiModal
3. Phi35VisionMultiModal
4. PaliGemmaMultiModal
1. Qwen2 Vision
2. Florence2
3. Phi3.5 Vision
4. PaliGemma
5. Mllama

Each model has its unique capabilities and can be selected based on your specific use case.

Expand Down

0 comments on commit d5b7511

Please sign in to comment.