This project implements an image captioning model using Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks. The model generates descriptive captions for images, trained on a dataset of 8,000 images.
- Image Feature Extraction: Utilizes CNN (VGG16) for extracting features from images.
- Caption Generation: Employs LSTM to generate textual descriptions based on the extracted image features.
- Dataset: Trained on a diverse dataset of 8,000 images.