Skip to content

Codebase for AAAI 2024 conference paper Visual Chain-of-Thought Prompting for Knowledge-based Visual Reasoning

Notifications You must be signed in to change notification settings

UMass-Foundation-Model/VisualCoT

Folders and files

NameName
Last commit message
Last commit date
Dec 11, 2023
Dec 11, 2023
Dec 11, 2023
Dec 11, 2023
Jun 15, 2024
Dec 11, 2023
Dec 11, 2023
Dec 11, 2023
Dec 11, 2023
Dec 11, 2023
Jul 21, 2024

Repository files navigation

Code for paper Visual Chain-of-Thought Prompting for Knowledge-based Visual Reasoning

Overall framework

framework

Preprocess datasets

  • Coco dataset 2014 and 2017
  • Download OK-VQA and AOK-VQA dataset, following the PICa format
  • Run preprocess script (preprocess/preprocess_aokvqa.sh for AOK-VQA and preprocess/preprocess_okvqa.sh) for OK-VQA
  • Make training object similarity file (object_similarity/object_similarity_aokvqa.sh for AOK-VQA and object_similarity/object_similarity_okvqa.sh for OK-VQA)

Prepare Scene graph and captions

  • Before running experiments, VisualCoT also need scene graph and captions, including three files for each input image (under input_text/scene_graph_text/scene_graph_coco17, input_text/scene_graph_text/scene_graph_coco17_attr, and input_text/scene_graph_text/scene_graph_coco17_caption). We have provided an example of image No.57 under each dir. Please follow the format of the examples and get scene graphs for all other images.
  • If you do not want to inference a scene graph model to get the scene graphs, here we provide the scene graphs and captions we generated (need additional process to match the format of above three examples):

Run experiments

  • run_aokvqa.sh for AOK-VQA
  • run_okvqa.sh for OK-VQA

Main Results

Backbone OK-VQA test (DA) AOK-VQA val (DA) AOK-VQA test (DA)
OPT-66B 44.6 46.4 46.0
Llama-2-70B 54.9 50.5 54.4

Cite

arXiv version

@article{chen2023see,
  title={Visual Chain-of-Thought Prompting for Knowledge-based Visual Reasoning},
  author={Chen, Zhenfang and Zhou, Qinhong and Shen, Yikang and Hong, Yining and Sun, Zhiqing and Gutfreund, Dan and Gan, Chuang},
  journal={arXiv preprint arXiv:2301.05226},
  year={2023}
}

About

Codebase for AAAI 2024 conference paper Visual Chain-of-Thought Prompting for Knowledge-based Visual Reasoning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published