3D Vision and Language Pretraining with Large-Scale Synthetic Data

Install

Install conda package

conda env create --name 3dsyn --file=environments.yml

install pointnet2

cd vision/pointnet2
python3 setup.py install

Prepare dataset

Follow Vil3dref and download scannet data under data/scanfamily/scan_data, this folder should look like

./data/scanfamily/scan_data/
├── instance_id_to_gmm_color
├── instance_id_to_loc
├── instance_id_to_name
└── pcd_with_global_alignment

Download scanrefer+referit3d, scanqa, and sqa3d, and put them under /data/scanfamily/annotations

data/scanfamily/annotations/
├── meta_data
│   ├── cat2glove42b.json
│   ├── scannetv2-labels.combined.tsv
│   ├── scannetv2_raw_categories.json
│   ├── scanrefer_corpus.pth
│   └── scanrefer_vocab.pth
├── qa
│   ├── ScanQA_v1.0_test_w_obj.json
│   ├── ScanQA_v1.0_test_wo_obj.json
│   ├── ScanQA_v1.0_train.json
│   └── ScanQA_v1.0_val.json
├── refer
│   ├── nr3d.jsonl
│   ├── scanrefer.jsonl
│   ├── sr3d+.jsonl
│   └── sr3d.jsonl
├── splits
│   ├── scannetv2_test.txt
│   ├── scannetv2_train.txt
│   └── scannetv2_val.txt
└── sqa_task
    ├── answer_dict.json
    └── balanced
        ├── v1_balanced_questions_test_scannetv2.json
        ├── v1_balanced_questions_train_scannetv2.json
        ├── v1_balanced_questions_val_scannetv2.json
        ├── v1_balanced_sqa_annotations_test_scannetv2.json
        ├── v1_balanced_sqa_annotations_train_scannetv2.json
        └── v1_balanced_sqa_annotations_val_scannetv2.json

Pretrain

To pretrain the model, use the following command:

python3 run.py --config config/pretrain/pretrained.yml

Fine-tuning

To fine-tune the model, use the following command:

python3 run.py --config config/finetune/{task}_config.yml

Evaluation

Download all checkpoints and put them under project/pretrain_weights

Checkpoint	Link	Note
Pretrain	link
ScanRefer	link	Fine-tuned ScanRefer from pre-trained checkpoint.
ScanQA	link	Fine-tined ScanQA from pre-trained checkpoint.
Sr3D	link	Fine-tuned Sr3D from pre-trained checkpoint.
Nr3D	link	Fine-tuned Nr3D from pre-trained checkpoint.
Scan2Cap	link	Fine-tuned Scan2Cap from pre-trained checkpoint.

Run

To run the model, use the following command, task includes scanrefer, scanqa, sr3d, nr3d, and scan2cap.

python3 run.py --config config/eval/{task}_config.yml

Acknowledgement

We would like to thank the authors of Vil3dref, 3D-Vista, 3D-VLP and for their open-source release.

Citation:

@inproceedings{3DSyn,
  title     = {3D Vision and Language Pretraining with Large-Scale Synthetic Data},
  author    = {Dejie Yang, Zhu Xu, Wentao Mo, Qingchao Chen, Siyuan Huang, Yang Liu},
  booktitle = {Proceedings of the Thirty-Second International Joint Conference on
               Artificial Intelligence, {IJCAI-24}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},
  year      = {2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.vscode		.vscode
3D-VLP @ 846a654		3D-VLP @ 846a654
configs		configs
dataset		dataset
model		model
models		models
optimization		optimization
pipeline		pipeline
scannet		scannet
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

3D Vision and Language Pretraining with Large-Scale Synthetic Data

Install

Prepare dataset

Pretrain

Fine-tuning

Evaluation

Run

Acknowledgement

Citation:

About

Releases

Packages

Languages

License

idejie/3DSyn

Folders and files

Latest commit

History

Repository files navigation

3D Vision and Language Pretraining with Large-Scale Synthetic Data

Install

Prepare dataset

Pretrain

Fine-tuning

Evaluation

Run

Acknowledgement

Citation:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages