This is an unofficial replication of "Pix2seq: A Language Modeling Framework for Object Detection" with pretrained model on mmdetection.
This project is released under the Apache 2.0 license.
Please refer to get_started.md for installation.
Train by running (about 10 days with 8*V100 32GB)
python -m torch.distributed.launch --nproc_per_node=8 --master_port=5003 \
tools/train.py configs/pix2seq/pix2seq_r50_8x4_50e_coco.py --work-dir pix2seq-output --gpus 8 --launcher pytorch
or
Download pretrained pix2seq weights.
Evaluate with single gpu:
python tools/test.py configs/pix2seq/pix2seq_r50_8x4_300_coco.py \
weights/checkpoints.pth --work-dir pix2seq-output --eval bbox --show-dir pix2seq-vis
Evaluate with 8 gpus:
python -m torch.distributed.launch --nproc_per_node=8 --master_port=5003 \
tools/test.py configs/pix2seq/pix2seq_r50_8x4_300_coco.py weights/checkpoints.pth \
--work-dir pix2seq-output --eval bbox --launcher pytorch
Method | backbone | Epoch | Batch Size | AP | AP50 | AP75 | Weights |
---|---|---|---|---|---|---|---|
Ours | R50 | 300 | 32 | 36.4 | 52.8 | 38.5 | model |
Paper | R50 | 300 | 128 | 43.0 | 61.0 | 45.6 | - |
- random shuffle targets
- training from scratch
- drop class token
- stochastic depth
- large scale jittering
- support for custom dataset
- two independent augmentations for each image
- FrozenBatchNorm2d in backbones
- auto-argument
- nucleus sampling