本项目是视觉与语言课程的期末课程作业,任务目标是VQA任务。预训练使用的是COCO Caption和Visual Genome数据,VQA使用的数据集是VQA2.0。
在本项目中,使用的是PixelBert模型及其改进(将backbone从ResNet50改为带GC Attentionn的ResNet50)
本项目主要需要如下的环境依赖,具体安装方式见快速开始!
- torch-1.10.0+cu113
- apex
- [detectron2-0.1.1]
- [horovod-0.19.4]
- 直接下载预处理好的数据和BERT权重, 点击下载,解压到与该项目同级的文件夹内。需要使用合并解压,命令示例:
zip subdata.zip -s=0 --out data.zip
# 解压data.zip
unzip data.zip
解压之后的文件目录结构应该为
.
├── pixelbert
│
└── vision_and_language_data
└── data
├── pretrained
├── txt_db
└── vis_db
-
创建新环境
conda create -n pixelbert python=3.9 &&\ conda activate pixelbert
-
安装torch1.10.0+cu113
# install torch1.10.0+cu113 pip3 install torch==1.10.1+cu113 torchvision==0.11.2+cu113 torchaudio==0.10.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
-
安装部分依赖
pip install -r requirements.txt
-
安装apex半精度计算包
git clone https://github.com/NVIDIA/apex.git &&\ cd apex &&\ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" . &&\ rm -rf ../apex &&\ cd ..
-
安装detectron 2
pip install 'git+https://github.com/facebookresearch/fvcore' &&\ python -m pip install 'git+https://github.com/facebookresearch/detectron2.git@ffff8ac'
-
安装horovod进行多卡分布式训练
# install horovod pip install --no-cache-dir horovod==0.19.4
-
使用完整数据预训练pixelbert-resnet50
horovodrun -np {number of gpus} python run_pretrain_resnet50_fulldata.py \ --config src/configs/pretrain_image_text_base_resnet50_mlm_itm.json \ --output_dir {path to save logs and checkpoints}
使用tensorboard可视化训练过程
tensorboard --logdir={path to save logs} --host={host}
示例:
# pretraining resnet50 vqa full data horovodrun -np 1 python run_pretrain_resnet50_fulldata.py \ --config src/configs/pretrain_image_text_base_resnet50_mlm_itm.json \ --output_dir ../vision_and_language_data/resnet50_pretrain_output tensorboard --logdir=../vision_and_language_data/resnet50_pretrain_output/log/ --host=162.105.94.222
-
完整数据预训练之后,对pixelbert-resnet50 进行vqa FineTuning
horovodrun -np {number of gpus} python run_vqa_resnet50.py \ --config src/configs/vqa_base_resnet50.json \ --output_dir {path to save logs and checkpoints}
使用tensorboard可视化训练过程
tensorboard --logdir={path to save logs} --host={host}
示例:
# resnet50-vqa horovodrun -np 1 python run_vqa_resnet50.py \ --config src/configs/vqa_base_resnet50.json \ --output_dir ../vision_and_language_data/resnet50_vqa_result tensorboard --logdir=../vision_and_language_data/resnet50_vqa_result/log/ --host=162.105.94.222
-
使用完整数据预训练pixelbert-resnet50withgcb
horovodrun -np {number of gpus} python run_pretrain_resnet50with_gcb.py \ --config src/configs/pretrain_image_text_base_resnet50withgcb_mlm_itm.json \ --output_dir {path to save logs and checkpoints}
使用tensorboard可视化训练过程
tensorboard --logdir={path to save logs} --host={host}
示例:
# pretraining resnet50withgcb vqa full data horovodrun -np 1 python run_pretrain_resnet50with_gcb.py \ --config src/configs/pretrain_image_text_base_resnet50withgcb_mlm_itm.json \ --output_dir ../vision_and_language_data/resnet50withgcb_pretrain_output tensorboard --logdir=../vision_and_language_data/resnet50withgcb_pretrain_output/log/ --host=162.105.94.222
-
完整数据预训练之后,对pixelbert-resnet50withgcb vqa 进行vqa FineTuning
horovodrun -np {number of gpus} python run_vqa_resnet50with_gcb.py \ --config src/configs/vqa_base_resnet50_with_gcb.json \ --output_dir {path to save logs and checkpoints}
使用tensorboard可视化训练过程
tensorboard --logdir={path to save logs} --host={host}
示例:
# resnet50withgcb vqa horovodrun -np 1 python run_vqa_resnet50with_gcb.py \ --config src/configs/vqa_base_resnet50_with_gcb.json \ --output_dir ../vision_and_language_data/resnet50_with_gcb_vqa_result/ tensorboard --logdir=../vision_and_language_data/resnet50_with_gcb_vqa_result/log/ --host=162.105.94.222
-
只使用coco数据预训练pixelbert-resnet50withgcb
horovodrun -np {number of gpus} python run_pretrain_resnet50with_gcb.py \ --config src/configs/pretrain_image_text_base_resnet50withgcb_mlm_itm_coco_cap.json \ --output_dir {path to save logs and checkpoints}
使用tensorboard可视化训练过程
tensorboard --logdir={path to save logs} --host={host}
示例:
# pretraining resnet50withgcb vqa less data horovodrun -np 1 python run_pretrain_resnet50with_gcb.py \ --config src/configs/pretrain_image_text_base_resnet50withgcb_mlm_itm_coco_cap.json\ --output_dir ../vision_and_language_data/resnet50withgcb_lessdata_pretrain_lessdata_output tensorboard --logdir=../vision_and_language_data/resnet50withgcb_lessdata_pretrain_lessdata_output/log/ --host=162.105.94.222
-
只使用coco数据预训练pixelbert-resnet50withgcb后进行vqa FineTuning
horovodrun -np {number of gpus} python run_vqa_resnet50with_gcb.py \ --config src/configs/vqa_base_resnet50_with_gcb_lessdata.json \ --output_dir {path to save logs and checkpoints}
使用tensorboard可视化训练过程
tensorboard --logdir={path to save logs} --host={host}
示例:
# resnet50withgcb vqa lessdata horovodrun -np 1 python run_vqa_resnet50with_gcb.py \ --config src/configs/vqa_base_resnet50_with_gcb_lessdata.json \ --output_dir ../vision_and_language_data/resnet50_with_gcb_vqa_lessdata_result/ tensorboard --logdir=../vision_and_language_data/resnet50_with_gcb_vqa_lessdata_result/log/ --host=162.105.94.222
-
只使用coco数据预训练pixelbert-resnet50
horovodrun -np {number of gpus} python run_pretrain_resnet50_lessdata.py \ --config src/configs/pretrain_image_text_base_resnet50_mlm_itm_coco_cap.json \ --output_dir {path to save logs and checkpoints}
使用tensorboard可视化训练过程
tensorboard --logdir={path to save logs} --host={host}
示例:
# pretraining resnet50 vqa less data horovodrun -np 1 python run_pretrain_resnet50_lessdata.py \ --config src/configs/pretrain_image_text_base_resnet50_mlm_itm_coco_cap.json\ --output_dir ../vision_and_language_data/resnet50_pretrain_lessdata_output tensorboard --logdir=../vision_and_language_data/resnet50_pretrain_lessdata_output/log/ --host=162.105.94.222
-
只使用coco数据预训练pixelbert-resnet50后进行vqa FineTuning
horovodrun -np {number of gpus} python run_vqa_resnet50_lessdata.py \ --config src/configs/vqa_base_resnet50_lessdata.json \ --output_dir {path to save logs and checkpoints}
使用tensorboard可视化训练过程
tensorboard --logdir={path to save logs} --host={host}
示例:
# resnet50 vqa lessdata horovodrun -np 1 python run_vqa_resnet50_lessdata.py \ --config src/configs/vqa_base_resnet50_lessdata.json \ --output_dir ../vision_and_language_data/resnet50_vqa_lessdata_result/ tensorboard --logdir=../vision_and_language_data/resnet50_vqa_lessdata_result/log/ --host=162.105.94.222
-
pixelbert-resnet50 完整数据预训练之后对vqa数据集验证集进行预测
horovodrun -np 1 python run_vqa_resnet50.py \ --do_inference 1\ --output_dir {path where save logs and checkpoints during training} \ --inference_split val\ --inference_model_step {checkpoint_saved} \ --inference_txt_db {text_data} \ --inference_img_db {img_dir} \ --inference_batch_size {batch_size}
示例:
# inference resnet50-vqa horovodrun -np 1 python run_vqa_resnet50.py \ --do_inference 1 --output_dir ../vision_and_language_data/resnet50_vqa_result \ --inference_split val --inference_model_step 22400 \ --inference_txt_db ../vision_and_language_data/data/txt_db/vqa/vqa_k_test.jsonl \ --inference_img_db ../vision_and_language_data/data/vis_db/coco_train2014_val2014 \ --inference_batch_size 32
-
pixelbert-resnet50 只使用coco数据预训练之后对vqa数据集验证集进行预测
horovodrun -np 1 python run_vqa_resnet50_lessdata.py \ --do_inference 1\ --output_dir {path where save logs and checkpoints during training} \ --inference_split val\ --inference_model_step {checkpoint_saved} \ --inference_txt_db {text_data} \ --inference_img_db {img_dir} \ --inference_batch_size {batch_size}
示例:
# inference resnet50 vqa lessdata horovodrun -np 1 python run_vqa_resnet50_lessdata.py \ --do_inference 1 --output_dir ../vision_and_language_data/resnet50_vqa_lessdata_result\ --inference_split val --inference_model_step 89900 \ --inference_txt_db ../vision_and_language_data/data/txt_db/vqa/vqa_k_test.jsonl \ --inference_img_db ../vision_and_language_data/data/vis_db/coco_train2014_val2014 \ --inference_batch_size 2
-
pixelbert-resnet50withgcb 完整数据预训练之后对vqa数据集验证集进行预测
horovodrun -np 1 python run_vqa_resnet50with_gcb.py \ --do_inference 1\ --output_dir {path where save logs and checkpoints during training} \ --inference_split val\ --inference_model_step {checkpoint_saved} \ --inference_txt_db {text_data} \ --inference_img_db {img_dir} \ --inference_batch_size {batch_size}
示例:
# inference resnet50withgcb vqa horovodrun -np 1 python run_vqa_resnet50with_gcb.py \ --do_inference 1 --output_dir../vision_and_language_data/resnet50_with_gcb_vqa_result\ --inference_split val --inference_model_step 26400 \ --inference_txt_db ../vision_and_language_data/data/txt_db/vqa/vqa_k_test.jsonl \ --inference_img_db ../vision_and_language_data/data/vis_db/coco_train2014_val2014 \ --inference_batch_size 2
-
pixelbert-resnet50withgcb 只使用coco数据预训练之后对vqa数据集验证集进行预测
horovodrun -np 1 python run_vqa_resnet50with_gcb.py \ --do_inference 1\ --output_dir {path where save logs and checkpoints during training} \ --inference_split val\ --inference_model_step {checkpoint_saved} \ --inference_txt_db {text_data} \ --inference_img_db {img_dir} \ --inference_batch_size {batch_size}
示例:
# inference resnet50withgcb vqa lessdata horovodrun -np 1 python run_vqa_resnet50with_gcb.py \ --do_inference 1 --output_dir ../vision_and_language_data/resnet50_with_gcb_vqa_lessdata_result\ --inference_split val --inference_model_step 19200 \ --inference_txt_db ../vision_and_language_data/data/txt_db/vqa/vqa_k_test.jsonl \ --inference_img_db ../vision_and_language_data/data/vis_db/coco_train2014_val2014 \ --inference_batch_size 2
模型 | 预训练数据 | Overall ACC |
---|---|---|
本项目复现 Pixel-BERT (ResNet50) | COCO + VG | 66.36 |
本项目改进 Pixel-BERT (ResNet50 with GCB) | COCO + VG(未预训练至收敛) | 59.10 |
本项目复现 Pixel-BERT (ResNet50) | COCO | 54.44 |
本项目改进 Pixel-BERT (ResNet50 with GCB) | COCO | 56.57 |
预训练模型可点击下载,文件结构为
.
├── data
│ ├── pretrained
│ │ └── bert-base-uncased
│ ├── txt_db
│ │ ├── pretrain
│ │ └── vqa
│ └── vis_db
│ ├── coco_test2015
│ ├── coco_train2014_val2014
│ └── vg
├── resnet50_pretrain_lessdata_output
│ ├── ckpt
│ └── log
├── resnet50_pretrain_output
│ └── ckpt
├── resnet50_vqa_lessdata_result
│ ├── ckpt
│ ├── log
│ └── results_valstep_89900
├── resnet50_vqa_result
│ ├── ckpt
│ ├── log
│ └── results_valstep_22400
├── resnet50withgcb_lessdata_pretrain_lessdata_output
│ ├── ckpt
│ └── log
├── resnet50withgcb_pretrain_output
│ ├── ckpt
│ └── log
├── resnet50_with_gcb_vqa_lessdata_result
│ ├── ckpt
│ ├── log
│ └── results_valstep_19200
└── resnet50_with_gcb_vqa_result
├── ckpt
├── log
└── results_valstep_26400