MMDetection provides hundreds of existing and existing detection models in Model Zoo), and supports multiple standard datasets, including Pascal VOC, COCO, CityScapes, LVIS, etc. This note will show how to perform common tasks on these existing models and standard datasets, including:
- Use existing models to inference on given images.
- Test existing models on standard datasets.
- Train predefined models on standard datasets.
By inference, we mean using trained models to detect objects on images. In MMDetection, a model is defined by a configuration file and existing model parameters are save in a checkpoint file.
To start with, we recommend Faster RCNN with this configuration file and this checkpoint file. It is recommended to download the checkpoint file to checkpoints
directory.
MMDetection provide high-level Python APIs for inference on images. Here is an example of building the model and inference on given images or videos.
from mmdet.apis import init_detector, inference_detector
import mmcv
# Specify the path to model config and checkpoint file
config_file = 'configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py'
checkpoint_file = 'checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth'
# build the model from a config file and a checkpoint file
model = init_detector(config_file, checkpoint_file, device='cuda:0')
# test a single image and show the results
img = 'test.jpg' # or img = mmcv.imread(img), which will only load it once
result = inference_detector(model, img)
# visualize the results in a new window
model.show_result(img, result)
# or save the visualization results to image files
model.show_result(img, result, out_file='result.jpg')
# test a video and show the results
video = mmcv.VideoReader('video.mp4')
for frame in video:
result = inference_detector(model, frame)
model.show_result(frame, result, wait_time=1)
A notebook demo can be found in demo/inference_demo.ipynb.
Note: inference_detector
only supports single-image inference for now.
For Python 3.7+, MMDetection also supports async interfaces. By utilizing CUDA streams, it allows not to block CPU on GPU bound inference code and enables better CPU/GPU utilization for single-threaded application. Inference can be done concurrently either between different input data samples or between different models of some inference pipeline.
See tests/async_benchmark.py
to compare the speed of synchronous and asynchronous interfaces.
import asyncio
import torch
from mmdet.apis import init_detector, async_inference_detector
from mmdet.utils.contextmanagers import concurrent
async def main():
config_file = 'configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py'
checkpoint_file = 'checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth'
device = 'cuda:0'
model = init_detector(config_file, checkpoint=checkpoint_file, device=device)
# queue is used for concurrent inference of multiple images
streamqueue = asyncio.Queue()
# queue size defines concurrency level
streamqueue_size = 3
for _ in range(streamqueue_size):
streamqueue.put_nowait(torch.cuda.Stream(device=device))
# test a single image and show the results
img = 'test.jpg' # or img = mmcv.imread(img), which will only load it once
async with concurrent(streamqueue):
result = await async_inference_detector(model, img)
# visualize the results in a new window
model.show_result(img, result)
# or save the visualization results to image files
model.show_result(img, result, out_file='result.jpg')
asyncio.run(main())
We also provide three demo scripts, implemented with high-level APIs and supporting functionality codes. Source codes are available here.
This script performs inference on a single image.
python demo/image_demo.py \
${IMAGE_FILE} \
${CONFIG_FILE} \
${CHECKPOINT_FILE} \
[--device ${GPU_ID}] \
[--score-thr ${SCORE_THR}]
Examples:
python demo/image_demo.py demo/demo.jpg \
configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py \
checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth \
--device cpu
This is a live demo from a webcam.
python demo/webcam_demo.py \
${CONFIG_FILE} \
${CHECKPOINT_FILE} \
[--device ${GPU_ID}] \
[--camera-id ${CAMERA-ID}] \
[--score-thr ${SCORE_THR}]
Examples:
python demo/webcam_demo.py \
configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py \
checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth
This script performs inference on a video.
python demo/video_demo.py \
${VIDEO_FILE} \
${CONFIG_FILE} \
${CHECKPOINT_FILE} \
[--device ${GPU_ID}] \
[--score-thr ${SCORE_THR}] \
[--out ${OUT_FILE}] \
[--show] \
[--wait-time ${WAIT_TIME}]
Examples:
python demo/video_demo.py demo/demo.mp4 \
configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py \
checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth \
--out result.mp4
To evaluate a model's accuracy, one usually tests the model on some standard datasets. MMDetection supports multiple public datasets including COCO, Pascal VOC, CityScapes, and more. This section will show how to test existing models on supported datasets.
Public datasets like Pascal VOC or mirror and COCO are available from official websites or mirrors. Note: In the detection task, Pascal VOC 2012 is an extension of Pascal VOC 2007 without overlap, and we usually use them together.
It is recommended to download and extract the dataset somewhere outside the project directory and symlink the dataset root to $MMDETECTION/data
as below.
If your folder structure is different, you may need to change the corresponding paths in config files.
We provide a script to download datasets such as COCO , you can run python tools/misc/download_dataset.py --dataset-name coco2017
to download COCO dataset.
For more usage please refer to dataset-download
mmdetection
├── mmdet
├── tools
├── configs
├── data
│ ├── coco
│ │ ├── annotations
│ │ ├── train2017
│ │ ├── val2017
│ │ ├── test2017
│ ├── cityscapes
│ │ ├── annotations
│ │ ├── leftImg8bit
│ │ │ ├── train
│ │ │ ├── val
│ │ ├── gtFine
│ │ │ ├── train
│ │ │ ├── val
│ ├── VOCdevkit
│ │ ├── VOC2007
│ │ ├── VOC2012
Some models require additional COCO-stuff datasets, such as HTC, DetectoRS and SCNet, you can download and unzip then move to the coco folder. The directory should be like this.
mmdetection
├── data
│ ├── coco
│ │ ├── annotations
│ │ ├── train2017
│ │ ├── val2017
│ │ ├── test2017
│ │ ├── stuffthingmaps
Panoptic segmentation models like PanopticFPN require additional COCO Panoptic datasets, you can download and unzip then move to the coco annotation folder. The directory should be like this.
mmdetection
├── data
│ ├── coco
│ │ ├── annotations
│ │ │ ├── panoptic_train2017.json
│ │ │ ├── panoptic_train2017
│ │ │ ├── panoptic_val2017.json
│ │ │ ├── panoptic_val2017
│ │ ├── train2017
│ │ ├── val2017
│ │ ├── test2017
The cityscapes annotations need to be converted into the coco format using tools/dataset_converters/cityscapes.py
:
pip install cityscapesscripts
python tools/dataset_converters/cityscapes.py \
./data/cityscapes \
--nproc 8 \
--out-dir ./data/cityscapes/annotations
TODO: CHANGE TO THE NEW PATH
We provide testing scripts for evaluating an existing model on the whole dataset (COCO, PASCAL VOC, Cityscapes, etc.). The following testing environments are supported:
- single GPU
- CPU
- single node multiple GPUs
- multiple nodes
Choose the proper script to perform testing depending on the testing environment.
# single-gpu testing
python tools/test.py \
${CONFIG_FILE} \
${CHECKPOINT_FILE} \
[--out ${RESULT_FILE}] \
[--eval ${EVAL_METRICS}] \
[--show]
# CPU: disable GPUs and run single-gpu testing script
export CUDA_VISIBLE_DEVICES=-1
python tools/test.py \
${CONFIG_FILE} \
${CHECKPOINT_FILE} \
[--out ${RESULT_FILE}] \
[--eval ${EVAL_METRICS}] \
[--show]
# multi-gpu testing
bash tools/dist_test.sh \
${CONFIG_FILE} \
${CHECKPOINT_FILE} \
${GPU_NUM} \
[--out ${RESULT_FILE}] \
[--eval ${EVAL_METRICS}]
tools/dist_test.sh
also supports multi-node testing, but relies on PyTorch's launch utility.
Optional arguments:
RESULT_FILE
: Filename of the output results in pickle format. If not specified, the results will not be saved to a file.EVAL_METRICS
: Items to be evaluated on the results. Allowed values depend on the dataset, e.g.,proposal_fast
,proposal
,bbox
,segm
are available for COCO,mAP
,recall
for PASCAL VOC. Cityscapes could be evaluated bycityscapes
as well as all COCO metrics.--show
: If specified, detection results will be plotted on the images and shown in a new window. It is only applicable to single GPU testing and used for debugging and visualization. Please make sure that GUI is available in your environment. Otherwise, you may encounter an error likecannot connect to X server
.--show-dir
: If specified, detection results will be plotted on the images and saved to the specified directory. It is only applicable to single GPU testing and used for debugging and visualization. You do NOT need a GUI available in your environment for using this option.--show-score-thr
: If specified, detections with scores below this threshold will be removed.--cfg-options
: if specified, the key-value pair optional cfg will be merged into config file--eval-options
: if specified, the key-value pair optional eval cfg will be kwargs for dataset.evaluate() function, it's only for evaluation
Assuming that you have already downloaded the checkpoints to the directory checkpoints/
.
-
Test Faster R-CNN and visualize the results. Press any key for the next image. Config and checkpoint files are available here.
python tools/test.py \ configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py \ checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth \ --show
-
Test Faster R-CNN and save the painted images for future visualization. Config and checkpoint files are available here.
python tools/test.py \ configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py \ checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth \ --show-dir faster_rcnn_r50_fpn_1x_results
-
Test Faster R-CNN on PASCAL VOC (without saving the test results) and evaluate the mAP. Config and checkpoint files are available here.
python tools/test.py \ configs/pascal_voc/faster_rcnn_r50_fpn_1x_voc.py \ checkpoints/faster_rcnn_r50_fpn_1x_voc0712_20200624-c9895d40.pth \ --eval mAP
-
Test Mask R-CNN with 8 GPUs, and evaluate the bbox and mask AP. Config and checkpoint files are available here.
./tools/dist_test.sh \ configs/mask_rcnn_r50_fpn_1x_coco.py \ checkpoints/mask_rcnn_r50_fpn_1x_coco_20200205-d4b0c5d6.pth \ 8 \ --out results.pkl \ --eval bbox segm
-
Test Mask R-CNN with 8 GPUs, and evaluate the classwise bbox and mask AP. Config and checkpoint files are available here.
./tools/dist_test.sh \ configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py \ checkpoints/mask_rcnn_r50_fpn_1x_coco_20200205-d4b0c5d6.pth \ 8 \ --out results.pkl \ --eval bbox segm \ --options "classwise=True"
-
Test Mask R-CNN on COCO test-dev with 8 GPUs, and generate JSON files for submitting to the official evaluation server. Config and checkpoint files are available here.
./tools/dist_test.sh \ configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py \ checkpoints/mask_rcnn_r50_fpn_1x_coco_20200205-d4b0c5d6.pth \ 8 \ --format-only \ --options "jsonfile_prefix=./mask_rcnn_test-dev_results"
This command generates two JSON files
mask_rcnn_test-dev_results.bbox.json
andmask_rcnn_test-dev_results.segm.json
. -
Test Mask R-CNN on Cityscapes test with 8 GPUs, and generate txt and png files for submitting to the official evaluation server. Config and checkpoint files are available here.
./tools/dist_test.sh \ configs/cityscapes/mask_rcnn_r50_fpn_1x_cityscapes.py \ checkpoints/mask_rcnn_r50_fpn_1x_cityscapes_20200227-afe51d5a.pth \ 8 \ --format-only \ --options "txtfile_prefix=./mask_rcnn_cityscapes_test_results"
The generated png and txt would be under
./mask_rcnn_cityscapes_test_results
directory.
MMDetection supports to test models without ground-truth annotations using CocoDataset
. If your dataset format is not in COCO format, please convert them to COCO format. For example, if your dataset format is VOC, you can directly convert it to COCO format by the script in tools. If your dataset format is Cityscapes, you can directly convert it to COCO format by the script in tools. The rest of the formats can be converted using this script.
python tools/dataset_converters/images2coco.py \
${IMG_PATH} \
${CLASSES} \
${OUT} \
[--exclude-extensions]
arguments:
IMG_PATH
: The root path of images.CLASSES
: The text file with a list of categories.OUT
: The output annotation json file name. The save dir is in the same directory asIMG_PATH
.exclude-extensions
: The suffix of images to be excluded, such as 'png' and 'bmp'.
After the conversion is complete, you can use the following command to test
# single-gpu testing
python tools/test.py \
${CONFIG_FILE} \
${CHECKPOINT_FILE} \
--format-only \
--options ${JSONFILE_PREFIX} \
[--show]
# CPU: disable GPUs and run single-gpu testing script
export CUDA_VISIBLE_DEVICES=-1
python tools/test.py \
${CONFIG_FILE} \
${CHECKPOINT_FILE} \
[--out ${RESULT_FILE}] \
[--eval ${EVAL_METRICS}] \
[--show]
# multi-gpu testing
bash tools/dist_test.sh \
${CONFIG_FILE} \
${CHECKPOINT_FILE} \
${GPU_NUM} \
--format-only \
--options ${JSONFILE_PREFIX} \
[--show]
Assuming that the checkpoints in the model zoo have been downloaded to the directory checkpoints/
, we can test Mask R-CNN on COCO test-dev with 8 GPUs, and generate JSON files using the following command.
./tools/dist_test.sh \
configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py \
checkpoints/mask_rcnn_r50_fpn_1x_coco_20200205-d4b0c5d6.pth \
8 \
-format-only \
--options "jsonfile_prefix=./mask_rcnn_test-dev_results"
This command generates two JSON files mask_rcnn_test-dev_results.bbox.json
and mask_rcnn_test-dev_results.segm.json
.
MMDetection supports inference with a single image or batched images in test mode. By default, we use single-image inference and you can use batch inference by modifying samples_per_gpu
in the config of test data. You can do that either by modifying the config as below.
data = dict(train=dict(...), val=dict(...), test=dict(samples_per_gpu=2, ...))
Or you can set it through --cfg-options
as --cfg-options data.test.samples_per_gpu=2
In test mode, ImageToTensor
pipeline is deprecated, it's replaced by DefaultFormatBundle
that recommended to manually replace it in the test data pipeline in your config file. examples:
# use ImageToTensor (deprecated)
pipelines = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', mean=[0, 0, 0], std=[1, 1, 1]),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img']),
])
]
# manually replace ImageToTensor to DefaultFormatBundle (recommended)
pipelines = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', mean=[0, 0, 0], std=[1, 1, 1]),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img']),
])
]
MMDetection also provides out-of-the-box tools for training detection models. This section will show how to train predefined models (under configs) on standard datasets i.e. COCO.
Training requires preparing datasets too. See section Prepare datasets above for details.
Note:
Currently, the config files under configs/cityscapes
use COCO pretrained weights to initialize.
You could download the existing models in advance if the network connection is unavailable or slow. Otherwise, it would cause errors at the beginning of training.
Important: The default learning rate in config files is for 8 GPUs and 2 sample per gpu (batch size = 8 * 2 = 16). And it had been set to auto_scale_lr.base_batch_size
in config/_base_/default_runtime.py
. Learning rate will be automatically scaled base on this value when the batch size is 16
. Meanwhile, in order not to affect other codebase which based on mmdet, the flag auto_scale_lr.enable
is set to False
by default.
If you want to enable this feature, you need to add argument --auto-scale-lr
. And you need to check the config name which you want to use before you process the command, because the config name indicates the default batch size.
By default, it is 8 x 2 = 16 batch size
, like faster_rcnn_r50_caffe_fpn_90k_coco.py
or pisa_faster_rcnn_x101_32x4d_fpn_1x_coco.py
. In other cases, you will see the config file name have _NxM_
in dictating, like cornernet_hourglass104_mstest_32x3_210e_coco.py
which batch size is 32 x 3 = 96
, or scnet_x101_64x4d_fpn_8x1_20e_coco.py
which batch size is 8 x 1 = 8
.
Please remember to check the bottom of the specific config file you want to use, it will have auto_scale_lr.base_batch_size
if the batch size is not 16
. If you can't find those values, check the config file which in _base_=[xxx]
and you will find it. Please do not modify its values if you want to automatically scale the LR.
Learning rate automatically scale basic usage is as follows.
python tools/train.py \
${CONFIG_FILE} \
--auto-scale-lr \
[optional arguments]
If you enabled this feature, the learning rate will be automatically scaled according to the number of GPUs of the machine and the batch size of training. See linear scaling rule for details. For example, If there are 4 GPUs and 2 pictures on each GPU, lr = 0.01
, then if there are 16 GPUs and 4 pictures on each GPU, it will automatically scale to lr = 0.08
.
If you don't want to use it, you need to calculate the learning rate according to the linear scaling rule manually then change optimizer.lr
in specific config file.
We provide tools/train.py
to launch training jobs on a single GPU.
The basic usage is as follows.
python tools/train.py \
${CONFIG_FILE} \
[optional arguments]
During training, log files and checkpoints will be saved to the working directory, which is specified by work_dir
in the config file or via CLI argument --work-dir
.
By default, the model is evaluated on the validation set every epoch, the evaluation interval can be specified in the config file as shown below.
# evaluate the model every 12 epoch.
evaluation = dict(interval=12)
This tool accepts several optional arguments, including:
--no-validate
(not suggested): Disable evaluation during training.--work-dir ${WORK_DIR}
: Override the working directory.--resume-from ${CHECKPOINT_FILE}
: Resume from a previous checkpoint file.--options 'Key=value'
: Overrides other settings in the used config.
Note:
Difference between resume-from
and load-from
:
resume-from
loads both the model weights and optimizer status, and the epoch is also inherited from the specified checkpoint. It is usually used for resuming the training process that is interrupted accidentally.
load-from
only loads the model weights and the training epoch starts from 0. It is usually used for finetuning.
The process of training on the CPU is consistent with single GPU training. We just need to disable GPUs before the training process.
export CUDA_VISIBLE_DEVICES=-1
And then run the script above.
Note:
We do not recommend users to use CPU for training because it is too slow. We support this feature to allow users to debug on machines without GPU for convenience.
We provide tools/dist_train.sh
to launch training on multiple GPUs.
The basic usage is as follows.
bash ./tools/dist_train.sh \
${CONFIG_FILE} \
${GPU_NUM} \
[optional arguments]
Optional arguments remain the same as stated above.
If you would like to launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs, you need to specify different ports (29500 by default) for each job to avoid communication conflict.
If you use dist_train.sh
to launch training jobs, you can set the port in commands.
CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh ${CONFIG_FILE} 4
CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 ./tools/dist_train.sh ${CONFIG_FILE} 4
If you launch with multiple machines simply connected with ethernet, you can simply run following commands:
On the first machine:
NNODES=2 NODE_RANK=0 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS
On the second machine:
NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS
Usually it is slow if you do not have high speed networking like InfiniBand.
Slurm is a good job scheduling system for computing clusters.
On a cluster managed by Slurm, you can use slurm_train.sh
to spawn training jobs. It supports both single-node and multi-node training.
The basic usage is as follows.
[GPUS=${GPUS}] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR}
Below is an example of using 16 GPUs to train Mask R-CNN on a Slurm partition named dev, and set the work-dir to some shared file systems.
GPUS=16 ./tools/slurm_train.sh dev mask_r50_1x configs/mask_rcnn_r50_fpn_1x_coco.py /nfs/xxxx/mask_rcnn_r50_fpn_1x
You can check the source code to review full arguments and environment variables.
When using Slurm, the port option need to be set in one of the following ways:
-
Set the port through
--options
. This is more recommended since it does not change the original configs.CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR} --options 'dist_params.port=29500' CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR} --options 'dist_params.port=29501'
-
Modify the config files to set different communication ports.
In
config1.py
, setdist_params = dict(backend='nccl', port=29500)
In
config2.py
, setdist_params = dict(backend='nccl', port=29501)
Then you can launch two jobs with
config1.py
andconfig2.py
.CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR} CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR}