Dynamic Language Binding in Relational Visual Reasoning

We present Language-binding Object Graph Network (LOGNet), the first neural reasoning method with dynamic relational structures across both visual and textual domains with applications in visual question answering. LOGNet offers an effective way to automatically identify word-object affiliations and form object relations in the context of a given question.

Illustration of a LOG unit:

Check out our paper for details.

Setups

Clone the repository:

git clone https://github.com/thaolmk54/LOGNet-VQA.git
Download CLEVR, CLEVR-Human and GQA dataset and edit corresponding paths in the repo upon where you locate your data.
Install dependencies:

conda create -n lognet_vqa python=3.6
conda activate lognet_vqa
pip install -r requirements.txt

Pretrain object detection on Visual Genome

We adapt the well-known Faster RCNN repo implemented in PyTorch and train it on Visual Genome and finally use a pretrained Faster R-CNN model for visual feature extraction. Adapted code and pretrained detection model on Visual Genome are available upon request.

Preprocess GloVe for word embedding

Download glove pretrained 300d word vectors to preprocess_glove/ then unzip and process it into a pickle file:

python txt2pickle.py

Finally, save the output file glove.840.300d.pkl somewhere for later use.

Experiments on CLEVR Dataset

Experiments on CLEVR are done within exp_clevr/. Please edit absolute paths in the configuration file at exp_clevr/configs/clevr.yml before running the commands.

Preprocessing visual features

You can download our pre-extracted features for CLEVR dataset here and save them in exp_clevr/data/.

Preprocess linguistic features

Extract PyTorch.pt files by running the following commands:

python preprocess/preprocess_questions.py --mode train --cfg configs/clevr.yml
    
python preprocess/preprocess_questions.py --mode val --cfg configs/clevr.yml

python preprocess/preprocess_questions.py --mode test --cfg configs/clevr.yml

Training

Choose a suitable config file in configs/ if you wish to train the network with full dataset, 20% of training data or 10% of the training data. For example, to train with all training samples, run the following command:

python train.py --cfg configs/clevr.yml

Evaluation

To evaluate the trained model, run the following:

python validate.py --cfg configs/clevr.yml --mode val

Note: A pretrained model on 10% of training data is available here. Save the file in results/expClevr10%LOGNet/ckpt/ for evaluation.

Experiments on CLEVR-Human

Experiments on CLEVR-Human are done within exp_clevr_human/. Please edit absolute paths in the configuration file at exp_clevr_human/configs/clevr_human.yml before running the commands.

Experiments on CLEVR-Human use the same visual features as CLEVR dataset.

Preprocess linguistic features

Extract PyTorch.pt files by running the following commands:

python preprocess/preprocess_questions.py --mode train --cfg configs/clevr_human.yml
    
python preprocess/preprocess_questions.py --mode val --cfg configs/clevr_human.yml

python preprocess/preprocess_questions.py --mode test --cfg configs/clevr_human.yml

Training

python train.py --cfg configs/clevr_human.yml

Evaluation

To evaluate the trained model, run the following:

python validate.py --cfg configs/clevr_human.yml --mode val

Experiments on GQA Dataset

Experiments on GQA dataset are done within exp_gqa/. Please edit absolute paths in the configuration file at exp_gqa/configs/gqa.yml before running the commands.

Preprocessing visual features

Download object features and spatial features for GQA dataset here and save them in exp_gqa/data/. We adapt the following script to merge h5 chunk files together:

python preprocess/merge.py --name objects
python preprocess/merge.py --name spatial

This should return two output files for each feature type: gqa_objects.h5/gqa_spatial.h5 and gqa_objects_merged_info.json/gqa_spatial_merged_info.json.

Preprocess linguistic features

Extract PyTorch.pt files by running the following commands:

python preprocess/preprocess_questions.py --mode train --cfg configs/gqa.yml

python preprocess/preprocess_questions.py --mode val --cfg configs/gqa.yml

python preprocess/preprocess_questions.py --mode test --cfg configs/gqa.yml

Training

Choose a suitable config file in configs/ if you wish to train the network with full dataset or 20% of training data. For example, to train with all training samples, run the following command:

python train.py --cfg configs/gqa.yml

Evaluation

To evaluate the trained model, run the following:

python validate.py --cfg configs/gqa.yml --mode val

Citation

If you make use of this repository for your research, please star this repo and cite the following paper:

@inproceedings{ijcai2020-114,
  title     = {Dynamic Language Binding in Relational Visual Reasoning},
  author    = {Minh Le, Thao and Le, Vuong and Venkatesh, Svetha and Tran, Truyen},
  booktitle = {Proceedings of the Twenty-Ninth International Joint Conference on
               Artificial Intelligence, {IJCAI-20}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},             
  editor    = {Christian Bessiere}	
  pages     = {818--824},
  year      = {2020},
  month     = {7},
  note      = {Main track}
  doi       = {10.24963/ijcai.2020/114},
  url       = {https://doi.org/10.24963/ijcai.2020/114},
}

Acknowledgement

We thank @jwyang for releasing the PyTorch code for object detection and the pretrained models.
We refer to this repo for preprocessing.
Our implementation of dataloader is based on this repo.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
exp_clevr		exp_clevr
exp_clevr_human		exp_clevr_human
exp_gqa		exp_gqa
preprocess_glove		preprocess_glove
.gitignore		.gitignore
LICENSE		LICENSE
LOGUnit.png		LOGUnit.png
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dynamic Language Binding in Relational Visual Reasoning

Setups

Pretrain object detection on Visual Genome

Preprocess GloVe for word embedding

Experiments on CLEVR Dataset

Preprocessing visual features

Preprocess linguistic features

Training

Evaluation

Experiments on CLEVR-Human

Preprocess linguistic features

Training

Evaluation

Experiments on GQA Dataset

Preprocessing visual features

Preprocess linguistic features

Training

Evaluation

Citation

Acknowledgement

About

Releases

Packages

Languages

License

thaolmk54/LOGNet-VQA

Folders and files

Latest commit

History

Repository files navigation

Dynamic Language Binding in Relational Visual Reasoning

Setups

Pretrain object detection on Visual Genome

Preprocess GloVe for word embedding

Experiments on CLEVR Dataset

Preprocessing visual features

Preprocess linguistic features

Training

Evaluation

Experiments on CLEVR-Human

Preprocess linguistic features

Training

Evaluation

Experiments on GQA Dataset

Preprocessing visual features

Preprocess linguistic features

Training

Evaluation

Citation

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages