GitHub - UMass-Foundation-Model/VisualCoT: Codebase for AAAI 2024 conference paper Visual Chain-of-Thought Prompting for Knowledge-based Visual Reasoning

Code for paper Visual Chain-of-Thought Prompting for Knowledge-based Visual Reasoning

Overall framework

Preprocess datasets

Coco dataset 2014 and 2017
Download OK-VQA and AOK-VQA dataset, following the PICa format
Run preprocess script (preprocess/preprocess_aokvqa.sh for AOK-VQA and preprocess/preprocess_okvqa.sh) for OK-VQA
Make training object similarity file (object_similarity/object_similarity_aokvqa.sh for AOK-VQA and object_similarity/object_similarity_okvqa.sh for OK-VQA)

Prepare Scene graph and captions

Before running experiments, VisualCoT also need scene graph and captions, including three files for each input image (under input_text/scene_graph_text/scene_graph_coco17, input_text/scene_graph_text/scene_graph_coco17_attr, and input_text/scene_graph_text/scene_graph_coco17_caption). We have provided an example of image No.57 under each dir. Please follow the format of the examples and get scene graphs for all other images.
If you do not want to inference a scene graph model to get the scene graphs, here we provide the scene graphs and captions we generated (need additional process to match the format of above three examples):
- Dense Captions.
- Attributes for COCO17 test dataset.
- Relations and Objects for COCO17 test dataset.
- Attributes for COCO17
- Relations and Objects for COCO17

Run experiments

run_aokvqa.sh for AOK-VQA
run_okvqa.sh for OK-VQA

Main Results

Backbone	OK-VQA test (DA)	AOK-VQA val (DA)	AOK-VQA test (DA)
OPT-66B	44.6	46.4	46.0
Llama-2-70B	54.9	50.5	54.4

Cite

arXiv version

@article{chen2023see,
  title={Visual Chain-of-Thought Prompting for Knowledge-based Visual Reasoning},
  author={Chen, Zhenfang and Zhou, Qinhong and Shen, Yikang and Hong, Yining and Sun, Zhiqing and Gutfreund, Dan and Gan, Chuang},
  journal={arXiv preprint arXiv:2301.05226},
  year={2023}
}

Name	Name	Last commit message	Last commit date
Latest commit zhouqqhh Merge branch 'main' of https://github.com/UMass-Foundation-Model/Visu… Jul 21, 2024 7543155 · Jul 21, 2024 History 10 Commits
input_text/scene_graph_text	input_text/scene_graph_text	add three examples	Dec 11, 2023
object_similarity	object_similarity	Initial commit	Dec 11, 2023
preprocess	preprocess	Initial commit	Dec 11, 2023
.gitignore	.gitignore	Initial commit	Dec 11, 2023
README.md	README.md	Update file urls	Jun 15, 2024
framework.png	framework.png	update README	Dec 11, 2023
main_aokvqa.py	main_aokvqa.py	Initial commit	Dec 11, 2023
main_okvqa.py	main_okvqa.py	Initial commit	Dec 11, 2023
run_aokvqa.sh	run_aokvqa.sh	Initial commit	Dec 11, 2023
run_okvqa.sh	run_okvqa.sh	Initial commit	Dec 11, 2023
utils_api.py	utils_api.py	upload utils_api	Jul 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overall framework

Preprocess datasets

Prepare Scene graph and captions

Run experiments

Main Results

Cite

About

Releases

Packages

Contributors 2

Languages

UMass-Foundation-Model/VisualCoT

Folders and files

Latest commit

History

Repository files navigation

Overall framework

Preprocess datasets

Prepare Scene graph and captions

Run experiments

Main Results

Cite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages