Guidance Needed: Adapting Pre-trained Llama Model for Image-to-Text Embedding with LLM2Vec and Unsloth #157

Linn0910 · 2024-12-22T14:28:14Z

Hello!
I am currently working on adapting my pre-trained Llama model for text embedding tasks using the LLM2Vec methodology. My objective is to configure the model to generate text embeddings directly from image inputs. I have been utilizing the Unsloth fine-tuning framework, as demonstrated in this Colab notebook.

Current Progress:

Model Output: The model successfully generates descriptive text for a given image input.
Desired Outcome: Instead of generating descriptive text, I aim for the model to produce a text embedding or token representation directly from the image input.

Challenges Encountered:

Integration of LLM2Vec: Uncertainty about how to apply the LLM2Vec methodology to enable the model to produce text embeddings from image inputs.
Unsloth Framework Adaptation: Need guidance on modifying the Unsloth fine-tuning process to accommodate this functionality.

Request for Assistance:

I would greatly appreciate guidance on the following:

Model Configuration: Steps to adjust the Llama model architecture to generate text embeddings directly from image inputs using LLM2Vec.
Fine-Tuning Process: Recommendations on adapting the Unsloth fine-tuning framework to support this functionality.

Implementation Examples: Any available examples or references that demonstrate similar adaptations.
Your expertise and support in this matter would be invaluable to the progression of my project.

Thank you for your assistance.

Best regards!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guidance Needed: Adapting Pre-trained Llama Model for Image-to-Text Embedding with LLM2Vec and Unsloth #157

Guidance Needed: Adapting Pre-trained Llama Model for Image-to-Text Embedding with LLM2Vec and Unsloth #157

Linn0910 commented Dec 22, 2024

Guidance Needed: Adapting Pre-trained Llama Model for Image-to-Text Embedding with LLM2Vec and Unsloth #157

Guidance Needed: Adapting Pre-trained Llama Model for Image-to-Text Embedding with LLM2Vec and Unsloth #157

Comments

Linn0910 commented Dec 22, 2024