GitHub - Narendhiranv04/Multimodal-Conversational-Image-Recognition-Chatbot

Multimodal Conversational Image Recognition Chatbot [PixelBot]

This project was developed as part of Smart India Hackathon-24, showcasing a multimodal chatbot capable of handling complex image-based queries. The chatbot integrates cutting-edge technologies to provide image segmentation, inpainting, and generation capabilities, combined with conversational intelligence to ensure a seamless and engaging user experience.

The system utilizes LLaVA, SAM2, and GLIGEN to process images for segmentation, inpainting, and generation tasks, while LSTM networks enhance the chatbot’s contextual understanding. Advanced keyword extraction techniques using RAKE and YAKE further refine conversational responses, making the chatbot reliable and accurate in addressing user queries.

Features

Multimodal interaction combining text and image understanding.
Image segmentation, inpainting, and generation using state-of-the-art models.
Context-aware conversation using LSTM-based memory management.
Keyword extraction with RAKE and YAKE for enhanced response accuracy.
Designed to handle complex queries, delivering precise and engaging responses.

Components Implemented

LLaVA: A Large Language and Vision Assistant for multimodal understanding.
SAM2: A powerful image segmentation model.
GLIGEN: Framework for image inpainting and generation.
RAKE and YAKE: Tools for keyword extraction to enhance chatbot performance.
LSTM Networks: Used for managing contextual memory in conversations.

Submission

This work was in preparation for Smart India Hackathon, India.

Presentation

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
GLIGEN		GLIGEN
LLaVA		LLaVA
SAM2 @ c2ec8e1		SAM2 @ c2ec8e1
demo_resources/images		demo_resources/images
lama		lama
sam2		sam2
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
app.py		app.py
lama_predict.py		lama_predict.py
lama_server.py		lama_server.py
llava_interactive.py		llava_interactive.py
ngrok.yml		ngrok.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_demo.sh		run_demo.sh
setup.sh		setup.sh
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Conversational Image Recognition Chatbot [PixelBot]

Features

Components Implemented

Submission

About

Releases

Packages

Contributors 6

Languages

License

Narendhiranv04/Multimodal-Conversational-Image-Recognition-Chatbot

Folders and files

Latest commit

History

Repository files navigation

Multimodal Conversational Image Recognition Chatbot [PixelBot]

Features

Components Implemented

Submission

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages