Intermodal Triplet Learning for Crossmodal Retrieval

A PyTorch implementation for an intermodal triplet network to learn the joint embedding space of both text and images. An application is crossmodal retrieval where given an image, we obtain the most relevant words and vice versa.

This particular implementation was trained on the NUSWIDE dataset that contains 81 groundtruth tags for each image along with noisy user-made tags.

Image to Text Example:

For each given image (on the bottom of each list), the 10 nearest words are retrieved using FAISS

Text to Image Example:

For each text query, the nearest 3 images are retrieved using FAISS

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
images		images
nuswide_processing_scripts		nuswide_processing_scripts
README.md		README.md
datasets.py		datasets.py
entire_nuswide_experiment.py		entire_nuswide_experiment.py
evaluate_model.py		evaluate_model.py
image_to_text_demo.ipynb		image_to_text_demo.ipynb
losses.py		losses.py
networks.py		networks.py
text_to_image_demo.ipynb		text_to_image_demo.ipynb
trainer.py		trainer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intermodal Triplet Learning for Crossmodal Retrieval

Image to Text Example:

Text to Image Example:

About

Releases

Packages

Languages

trinhdrew1418/intermodal-triplet-network

Folders and files

Latest commit

History

Repository files navigation

Intermodal Triplet Learning for Crossmodal Retrieval

Image to Text Example:

Text to Image Example:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages