This project sets up an ML pipeline that:
- Takes a
video
file as input. - Uses the
av
Python library to extract video frames and their timestamps. - Utilizes a pre-trained/custom-trained
YOLOv8
model to detect Pepsi and CocaCola logos in the video. - Outputs a
JSON
file with timestamps for each detected logo.
The output JSON file will have the following format:
{
"Pepsi_pts": [10.1, 10.2, 10.3, ...],
"CocaCola_pts": [20.3, 31.8, 40.12, ...]
}
- Python 3.x
- Google Colab or a local machine with GPU support (optional but recommended for faster execution)
The following Python packages are required:
-
av
for video frame extraction -
torch
andtorchvision
for the YOLOv8 model -
opencv-python
for additional image processing -
yolov8
for the pre-trained YOLOv8 model
-
git clone https://github.com/your-username/Pepsi-Coke-LogoDetection.git cd Pepsi-Coke-LogoDetection
pip install virtualenv
virtualenv venv
# On Windows
venv\Scripts\activate
# On macOS/Linux
source venv/bin/activate
pip install av torch torchvision opencv-python yolov8
python detect_logos.py --video path/to/your/video.mp4
We used a pre-trained YOLOv8 model for detecting Pepsi and Coca-Cola logos in a video. This method involves leveraging an already trained YOLOv8 model to perform object detection directly on video frames extracted from the input video.
The pre-trained model has been trained on diverse datasets, which helps in achieving good detection accuracy even for logos in various contexts and backgrounds.
By using a pre-trained model, we can focus our efforts on other critical aspects of the project, such as frame extraction, timestamping, and integration of the different components into a seamless pipeline.
Gather a dataset of images containing Pepsi and Coca-Cola logos. This dataset should be labeled with bounding boxes around each logo.
Use the YOLOv8 architecture to train a model on your labeled dataset. This involves several iterations of training and validation to optimize the model's ability to detect logos accurately in various video frames.
Once trained, the custom model can be used for inference. This involves deploying the model to process each frame of the video, detect the logos, and generate timestamps as per your project requirements.
A custom trained model can be fine-tuned on specific data relevant to your application, leading to higher accuracy compared to generic, pre-trained models that might not be optimized for your exact use case.
You have control over the training data, allowing you to focus on characteristics and variations that are important for your application. This customization helps in better handling of variations in lighting, angles, backgrounds, etc., which are crucial for logo detection in videos.
By training on data that closely matches real-world scenarios, the model can generalize better to unseen data. This means it can perform well on new videos or environments that were not explicitly part of the training set.
Custom models can be iteratively improved and adjusted based on performance feedback. You can continuously refine the model to achieve better results as you gather more data and insights into its performance.
Training a custom model can be done using proprietary or sensitive data internally, ensuring data privacy and compliance with regulations. This can be crucial for applications where data confidentiality is paramount.
Training your own model allows you to incorporate domain-specific knowledge and insights into the model architecture and training process, leading to more effective solutions for your specific problem.
Building and training a custom model provides valuable learning experiences and insights into machine learning and deep learning techniques. It can also serve as a basis for further research and development in related areas.