Yiming Zuo* · Karhan Kayan* · Maggie Wang · Kevin Jeon · Jia Deng · Thomas L. Griffiths
(*equal contribution)
If you use our benchmark or data in your work, please cite our 3DV paper:
@article{zuo2025towards,
title={Towards Foundation Models for 3D Vision: How Close Are We?},
author={Zuo, Yiming and Kayan, Karhan and Wang, Maggie and Jeon, Kevin and Deng, Jia and Griffiths, Thomas L},
journal={International Conference on 3D Vision (3DV)},
year={2025}
}
Download the pre-generated image and question pairs in this google drive link. We also provide instructions on how to generate the benchmark from the original datasets here.
Go to the LLM_evaluations/relative_depth
folder and run python generate_gpt4v_response.py
(replace with your own OpenAI API key). This will save the GPT response in json format. We also provide the raw response we collected from GPT4v, GPT4o, and Gemini in this google drive link.
Then run python evaluate_gpt4v_response.py
, which computes the accuracy.
Go to the Human_Study/relative_depth
folder and run python evaluate_mturk.py
.
Go to the LLM_evaluations/clevr_vqa
folder and run python generate_gpt_response.py
(replace with your own OpenAI API key). This will save the GPT response in json format. We also provide the raw response we collected from GPT4v, GPT4o, Gemini, and the specialized model MDETR in this google drive link.
Then run python compute_acc.py
, which computes the accuracy.
Go to the Human_Study/clevr_vqa
folder and run python evaluate_mturk.py
.
Go to the LLM_evaluations/relative_camera_pose
folder and run python generate_gpt4v_response.py
(replace with your own OpenAI API key). This will save the GPT response in json format. We also provide the raw response we collected from GPT4v, GPT4o, and Gemini in this google drive link.
Then run python evaluate_gpt4v_response.py
, which computes the accuracy.
Go to the Human_Study/relative_camera_pose
folder and run python evaluate_mturk.py
.