This project was developed as part of Smart India Hackathon-24, showcasing a multimodal chatbot capable of handling complex image-based queries. The chatbot integrates cutting-edge technologies to provide image segmentation, inpainting, and generation capabilities, combined with conversational intelligence to ensure a seamless and engaging user experience.
The system utilizes LLaVA, SAM2, and GLIGEN to process images for segmentation, inpainting, and generation tasks, while LSTM networks enhance the chatbot’s contextual understanding. Advanced keyword extraction techniques using RAKE and YAKE further refine conversational responses, making the chatbot reliable and accurate in addressing user queries.
- Multimodal interaction combining text and image understanding.
- Image segmentation, inpainting, and generation using state-of-the-art models.
- Context-aware conversation using LSTM-based memory management.
- Keyword extraction with RAKE and YAKE for enhanced response accuracy.
- Designed to handle complex queries, delivering precise and engaging responses.
-
LLaVA: A Large Language and Vision Assistant for multimodal understanding.
-
SAM2: A powerful image segmentation model.
-
GLIGEN: Framework for image inpainting and generation.
-
RAKE and YAKE: Tools for keyword extraction to enhance chatbot performance.
-
LSTM Networks: Used for managing contextual memory in conversations.
This work was in preparation for Smart India Hackathon, India.