Diffusion Models as Data Mining Tools
Ioannis Siglidis, Aleksander Hołyński, Alexei A. Efros,Mathieu Aubry, Shiry Ginosar
Official PyTorch implementation of Diffusion Models as Data Mining Tools, which has been accepted in ECCV'24.
Our approach allows you to take a large labelled input dataset, and mine the patches that are important for each label. It involves three steps:
- First you finetune Stable-Diffusion v1.5 with its standard loss
$L_t(x, \epsilon, c)$ with prompts of the form$\text{"An image of Y"}$ (where Y is your label) in your custom dataset. - For a sample of your input data you want to analyze, you then compute typicality
$\mathbf{T}(x|c) = \mathbb{E}_{\epsilon,t}[L_t(x, \epsilon, \varnothing) - L_t(x, \epsilon, c)]$ for all images. - You extract the top-1000 patches according to
$\mathbf{T}(x | c)$ and then you cluster them using DIFT-161 features (ranking clusters according to median typicality of their elements).
Our codebase is mainly developed on diffusers implementation of LDMs.
conda env create -f environment.yaml
conda activate diff-mining
We apply our method in 5 different types of datasets: cars (CarDB), faces (FTT), street-view images (G^3), scenes (Places, high-res) and X-rays (ChestX-ray):
- A properly extracted version of CarDB can be found here and can be downloaded with:
python scripts/download-cardb.py
- FTT: you can request access for downloading the dataset in the original project page.:
- G^3: unfortunately proprietary but information about PanoramaIDs can be found on the original repo.
- Places: you can request access (trivial to get) from original project page.
We share our models on huggingface which you can access through the handles:
or download them locally using:
python scripts/download-models.py
A full walkthrough of the pipeline can be seen in scripts: scripts/training.sh
and scripts/typicality.sh
.
- Code for finetuning models can be found under:
diffmining/finetuning/
. - Code for computing typicality can be found at:
diffmining/typicality/compute.py
. - Code for averaging typicality across patches, computing DIFT features and clustering can be found at:
diffmining/typicality/cluster.py
We test our typicality measure in two different approaches which we properly discuss in our paper.
Using our diffusion model, we can translate each image, e.g. in the case of geography, from one country to another. We use PnP which is the only method we found that was relatively robust in keeping a consistency between translated objects (i.e., windows would remain windows). You can launch this translation by running:
source scripts/parallel.sh translate
Afterwards you need to compute typicality for all elements:
source scripts/parallel.sh compute
and then cluster them using:
source scripts/parallel.sh cluster
As typicality is connected to a binary classifier of the conditional vs the null conditioning, it can be used to "spatialize" information related to the condition on the input image. We test this on X-ray images and show how typicality is improved after finetuning. To reproduce our results and evaluations run:
source scripts/xray.sh
We provide a minimal optimized implementation of the algorithm of "What makes Paris look like Paris?" under doersch/
.
Running the code should only require:
python doersch.py --which geo --category 'Italy'
yet you will probably have to adjust it to the dataset of choice.
@article{diff-mining,
title = {Diffusion Models as Data Mining Tools},
author = {Siglidis, Ioannis and Holynski, Aleksander and Efros, A. Alexei and Aubry, Mathieu and Ginosar, Shiry},
journal = {ECCV},
year = {2024},
}
This work was partly supported by the European Research Council (ERC project DISCOVER, number 101076028) and leveraged the HPC resources of IDRIS under the allocation AD011012905R1, AD0110129052 made by GENCI. We would like to thank Grace Luo for data, code, and discussion; Loic Landreu and David Picard for insights on geographical representations and diffusion; Karl Doersch, for project advice and implementation insights; Sophia Koepke for feedback on our manuscript.