Skip to content

ginihumer/Amumo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Amumo (Analyze Multi-Modal Models)

Understanding and Comparing Multi-Modal Models

Amumo is a visual analytics prototype that allows users to explore latent spaces of inter-modal data pairs (in particular pairs of image and text embeddings as they can be retrieved by CLIP-like models). It is implemented as a collection of ipywidgets and comes with a set of pre-defined datasets and bi-modal contrastive models like CLIP, CyCLIP, and CLOOB to keep the overhead for users as low as possible. Additionally, users can define their own datasets and models to explore. Check out the "getting started" notebook for first steps.

Examples

You can use Amumo to interactively explore bi-modal datasets...

... or compare various bi-modal models.

Installation

Set up conda environment with ipywidgets:

conda create -n myenv python=3.9
activate myenv
pip install ipykernel
pip install ipywidgets

Local installation:

pip install -e .

Package installation:

pip install "amumo @ git+https://github.com/ginihumer/Amumo.git"

Or you can create the conda environment from the .yml file:

conda env create -f environment.yml

If you want to install the requirements for the interactive VISxAI article you can install them as follows:

pip install "amumo[visxai] @ git+https://github.com/ginihumer/Amumo.git"

Understanding and Comparing Multi-Modal Models

Exploring the Latent Space of CLIP-like Models (CLIP, CyCLIP, CLOOB) Using Inter-Modal Pairs (Featuring Amumo, Your Friendly Neighborhood Mummy)

Abstract

Contrastive Language Image Pre-training (CLIP) and variations of this approach like CyCLIP, or CLOOB are trained on image-text pairs with a contrastive objective. The goal of contrastive loss objectives is to minimize latent-space distances of data points that have the same underlying meaning. We refer to the particular cases of contrastive learning that CLIP-like models perform as multi-modal contrastive learning because they use two (or more) modes of data (e.g., images and texts) where each mode uses their own encoder to generate a latent embedding space. More specifically, the objective that CLIP is optimized for minimizes the distances between image-text embeddings of pairs that have the same semantic meaning while maximizing the distances to all other combinations of text and image embeddings. We would expect that such a shared latent space places similar concepts of images and texts close to each other. However, the reality is a bit more complicated...

Resources

Check out the Interactive article submitted to the 6th Workshop on Visualization for AI Explainability (VISxAI 2023).

Check out the computational notebook to reproduce the results shown in the article or for using as a starting point for future investigations.

Check out the computational notebook for exporting the data used in the interactive article.

How to cite?

You may cite Amumo using the following bibtex:

@software{humer2023amumo,
  author = {Humer, Christina and Streit, Marc and Strobelt, Hendrik},
  title = {{Amumo (Analyze Multi-Modal Models)}},
  url = {https://github.com/ginihumer/Amumo},
  year = {2023}
}

Troubleshooting

Special thanks to Hussein Aly for adding this section.

SSL Certificate Verification Error

If you encounter an SSL certificate verification error, such as 'CERTIFICATE_VERIFY_FAILED,' while installing or running Amumo, it might be due to your system's SSL configuration. Here are a few steps you can take to address this issue:

  • Try updating your SSL certificate bundle
    pip install --upgrade certifi
  • Try disabling SSL verification temporarily for your Python script.
    import ssl
    ssl._create_default_https_context = ssl._create_unverified_context

Pycocotools Installation Error

If you encounter installation issues specifically related to pycocotools, especially when working with VS Code, follow these steps to resolve them:

  1. Prerequisites: Ensure that you have the necessary prerequisites installed on your system:

    • On Windows, install the Visual C++ Build Tools/ Visual Studio Code. You can download them from Visual Studio Downloads.
    • For Linux systems, make sure you have gcc and make installed. You can install them using your package manager (e.g., sudo apt install build-essential for Ubuntu).
  2. Python Wheels: Consider installing pycocotools using precompiled Python wheels instead of attempting to build from source:

    pip install cython
    pip install "git+https://github.com/philferriere/cocoapi.git#egg=pycocotools&subdirectory=PythonAPI"

About

python framework for exploring CLIP models

Resources

License

Stars

Watchers

Forks

Packages

No packages published