Introduction

The objective of this repository is to implement Open AI's CLIP Paper: Learning Transferable Visual Models From Natural Language Supervision from scratch using PyTorch.

Read Paper: https://arxiv.org/pdf/2103.00020.pdf

About CLIP

A model designed for learning joint representations of images and text.
Leverages a shared embedding space, where images and their corresponding textual descriptions are mapped to similar points.
Uses a contrastive learning objective to train the model. It aims to maximize the similarity between positive pairs (Correct Image-Text pairs) and minimize the similarity between negative pairs (incorrect pairs)

Code Implementation

Text Encoder

The distilbert-base-uncased encoder model for embedding the texts
The resulting text encoder embedding dimension will be of shape - (batch_size, text_embedding) -> (32, 768)

Image Encoder

The resnet50 model pretrained model is used for encoding the images
The resulting image encoder embedding dimension will be of shape - (batch_size, image_embedding) -> (32, 2048)

Projection Head

The Projection Head serves a crucial role in shaping the representations learned by the model.

Responsbile for reducing the dimensionality of the high-dimensional embeddings produced by the image encoder and text encoder
Projecting the embeddings into a lower dimensional space, the model can focus on the most relevant features for the contrastive learning task
Enhances the discriminative power of the learned representations, helping the model distinguish between positive and negative pairs more effectively during the constrastive learning process.

Results:

Try CLIP Demo in HuggingFace Spaces: https://huggingface.co/spaces/bala1802/clip_demo

Prompt: "people sitting near the beach"

Prompt: "people walking inside the forest"

Prompt: "playing soccer"

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
avg_meter.py		avg_meter.py
captions.csv		captions.csv
captions.txt		captions.txt
clip_inferencing.py		clip_inferencing.py
clip_model.py		clip_model.py
clipdataset.py		clipdataset.py
configuration.py		configuration.py
image_encoder.py		image_encoder.py
projection_head.py		projection_head.py
requirements.txt		requirements.txt
text_encoder.py		text_encoder.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

About CLIP

Code Implementation

Text Encoder

Image Encoder

Projection Head

Results:

About

Releases

Packages

Languages

License

bala1802/OpenAI_CLIP

Folders and files

Latest commit

History

Repository files navigation

Introduction

About CLIP

Code Implementation

Text Encoder

Image Encoder

Projection Head

Results:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages