Skip to content

[3DV25] Official code for "Towards Foundation Models for 3D Vision: How Close Are We?"

License

Notifications You must be signed in to change notification settings

princeton-vl/UniQA-3D

Repository files navigation

Towards Foundation Models for 3D Vision: How Close Are We?

Yiming Zuo* · Karhan Kayan* · Maggie Wang · Kevin Jeon · Jia Deng · Thomas L. Griffiths

(*equal contribution)

Princeton University

Logo

Citation

If you use our benchmark or data in your work, please cite our 3DV paper:

@article{zuo2025towards,
  title={Towards Foundation Models for 3D Vision: How Close Are We?},
  author={Zuo, Yiming and Kayan, Karhan and Wang, Maggie and Jeon, Kevin and Deng, Jia and Griffiths, Thomas L},
  journal={International Conference on 3D Vision (3DV)},
  year={2025}
}

Download the Benchmark

Download the pre-generated image and question pairs in this google drive link. We also provide instructions on how to generate the benchmark from the original datasets here.

Relative Depth Estimation

Evaluate VLM accuracy

Go to the LLM_evaluations/relative_depth folder and run python generate_gpt4v_response.py (replace with your own OpenAI API key). This will save the GPT response in json format. We also provide the raw response we collected from GPT4v, GPT4o, and Gemini in this google drive link.

Then run python evaluate_gpt4v_response.py, which computes the accuracy.

Human response raw data

Go to the Human_Study/relative_depth folder and run python evaluate_mturk.py.

Spatial VQA

Evaluate VLM accuracy

Go to the LLM_evaluations/clevr_vqa folder and run python generate_gpt_response.py (replace with your own OpenAI API key). This will save the GPT response in json format. We also provide the raw response we collected from GPT4v, GPT4o, Gemini, and the specialized model MDETR in this google drive link.

Then run python compute_acc.py, which computes the accuracy.

Human response raw data

Go to the Human_Study/clevr_vqa folder and run python evaluate_mturk.py.

Relative Camera Pose Estimation

Evaluate VLM accuracy

Go to the LLM_evaluations/relative_camera_pose folder and run python generate_gpt4v_response.py (replace with your own OpenAI API key). This will save the GPT response in json format. We also provide the raw response we collected from GPT4v, GPT4o, and Gemini in this google drive link.

Then run python evaluate_gpt4v_response.py, which computes the accuracy.

Human response raw data

Go to the Human_Study/relative_camera_pose folder and run python evaluate_mturk.py.

About

[3DV25] Official code for "Towards Foundation Models for 3D Vision: How Close Are We?"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages