- Install conda package
conda env create --name 3dsyn --file=environments.yml
- install pointnet2
cd vision/pointnet2
python3 setup.py install
- Follow Vil3dref and download scannet data under
data/scanfamily/scan_data
, this folder should look like
./data/scanfamily/scan_data/
├── instance_id_to_gmm_color
├── instance_id_to_loc
├── instance_id_to_name
└── pcd_with_global_alignment
- Download scanrefer+referit3d, scanqa, and sqa3d, and put them under
/data/scanfamily/annotations
data/scanfamily/annotations/
├── meta_data
│ ├── cat2glove42b.json
│ ├── scannetv2-labels.combined.tsv
│ ├── scannetv2_raw_categories.json
│ ├── scanrefer_corpus.pth
│ └── scanrefer_vocab.pth
├── qa
│ ├── ScanQA_v1.0_test_w_obj.json
│ ├── ScanQA_v1.0_test_wo_obj.json
│ ├── ScanQA_v1.0_train.json
│ └── ScanQA_v1.0_val.json
├── refer
│ ├── nr3d.jsonl
│ ├── scanrefer.jsonl
│ ├── sr3d+.jsonl
│ └── sr3d.jsonl
├── splits
│ ├── scannetv2_test.txt
│ ├── scannetv2_train.txt
│ └── scannetv2_val.txt
└── sqa_task
├── answer_dict.json
└── balanced
├── v1_balanced_questions_test_scannetv2.json
├── v1_balanced_questions_train_scannetv2.json
├── v1_balanced_questions_val_scannetv2.json
├── v1_balanced_sqa_annotations_test_scannetv2.json
├── v1_balanced_sqa_annotations_train_scannetv2.json
└── v1_balanced_sqa_annotations_val_scannetv2.json
To pretrain the model, use the following command:
python3 run.py --config config/pretrain/pretrained.yml
To fine-tune the model, use the following command:
python3 run.py --config config/finetune/{task}_config.yml
Download all checkpoints and put them under project/pretrain_weights
Checkpoint | Link | Note |
---|---|---|
Pretrain | link | |
ScanRefer | link | Fine-tuned ScanRefer from pre-trained checkpoint. |
ScanQA | link | Fine-tined ScanQA from pre-trained checkpoint. |
Sr3D | link | Fine-tuned Sr3D from pre-trained checkpoint. |
Nr3D | link | Fine-tuned Nr3D from pre-trained checkpoint. |
Scan2Cap | link | Fine-tuned Scan2Cap from pre-trained checkpoint. |
To run the model, use the following command, task includes scanrefer, scanqa, sr3d, nr3d, and scan2cap.
python3 run.py --config config/eval/{task}_config.yml
We would like to thank the authors of Vil3dref, 3D-Vista, 3D-VLP and for their open-source release.
@inproceedings{3DSyn,
title = {3D Vision and Language Pretraining with Large-Scale Synthetic Data},
author = {Dejie Yang, Zhu Xu, Wentao Mo, Qingchao Chen, Siyuan Huang, Yang Liu},
booktitle = {Proceedings of the Thirty-Second International Joint Conference on
Artificial Intelligence, {IJCAI-24}},
publisher = {International Joint Conferences on Artificial Intelligence Organization},
year = {2024},
}