Sehyung Kim*, Chanhyeong Yang*, Jihwan Park, Taehoon Song, Hyunwoo J. Kim†.
AAAI 2025
This is the official implementation of AAAI 2025 paper "Super-class guided Transformer for Zero-Shot Attribute Classification"
git clone https://github.com/mlvlab/SugaFormer.git
cd SugaFormer
conda create -n sugaformer python==3.9
conda activate sugaformer
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
pip install -r requirements.txt
To run experiments for VAW, you need both the images from the Visual Genome dataset and the annotation files. Follow the steps below:
- Download the Visual Genome images from the link.
- Download the annotation files for VAW experiments from the link.
After downloading the Visual Genome images and annotation files, organize them into the following directory structure:
data/
└── vaw/
├── images/
│ ├── VG_100K/
│ └── VG_100K_2/
│
└── annotations/
├── train.json
├── test.json
├── ...
Train the model in the fully-supervised setting:
./configs/vaw/train_fs.sh
Train the model in the zero-shot setting:
./configs/vaw/train_zs.sh
Evaluate the model in the fully-supervised setting:
./configs/vaw/eval_fs.sh
Evaluate the model in the zero-shot setting:
./configs/vaw/eval_zs.sh
This repository is built upon the following works:
-
DETR (Facebook Research): The codebase we built upon and the foundation for our base model.
-
LAVIS (Salesforce): Pre-trained Vision-Language Models (BLIP2) that we utilized for feature extraction and knowledge transfer.
If you have any questions, please create an issue on this repository or contact at [email protected].
If you find our work interesting, please consider giving a ⭐ and citation.
@article{kim2025super,
title={Super-class guided Transformer for Zero-Shot Attribute Classification},
author={Kim, Sehyung and Yang, Chanhyeong and Park, Jihwan and Song, Taehoon and Kim, Hyunwoo J},
journal={arXiv preprint arXiv:2501.05728},
year={2025}
}