Skip to content

Narendhiranv04/Multimodal-Conversational-Image-Recognition-Chatbot

Repository files navigation

Multimodal Conversational Image Recognition Chatbot [PixelBot]

This project was developed as part of Smart India Hackathon-24, showcasing a multimodal chatbot capable of handling complex image-based queries. The chatbot integrates cutting-edge technologies to provide image segmentation, inpainting, and generation capabilities, combined with conversational intelligence to ensure a seamless and engaging user experience.

The system utilizes LLaVA, SAM2, and GLIGEN to process images for segmentation, inpainting, and generation tasks, while LSTM networks enhance the chatbot’s contextual understanding. Advanced keyword extraction techniques using RAKE and YAKE further refine conversational responses, making the chatbot reliable and accurate in addressing user queries.

WhatsApp Image 2024-12-05 at 12 12 08 AM

Features

  1. Multimodal interaction combining text and image understanding.
  2. Image segmentation, inpainting, and generation using state-of-the-art models.
  3. Context-aware conversation using LSTM-based memory management.
  4. Keyword extraction with RAKE and YAKE for enhanced response accuracy.
  5. Designed to handle complex queries, delivering precise and engaging responses.

Screenshot from 2024-09-12 16-55-40

Components Implemented

  • LLaVA: A Large Language and Vision Assistant for multimodal understanding.

  • SAM2: A powerful image segmentation model.

  • GLIGEN: Framework for image inpainting and generation.

  • RAKE and YAKE: Tools for keyword extraction to enhance chatbot performance.

  • LSTM Networks: Used for managing contextual memory in conversations.

Submission

This work was in preparation for Smart India Hackathon, India.

Presentation

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published