Merge pull request #1 from msracver/master

update
msracver · Jun 16, 2017 · 8db823f · 8db823f
2 parents bce204a + 110383d
commit 8db823f
Show file tree

Hide file tree

Showing 110 changed files with 17,628 additions and 692 deletions.
diff --git a/README.md b/README.md
@@ -1,21 +1,12 @@
 # Deformable Convolutional Networks
 
 
-The major contributors of this repository include [Yuwen Xiong](https://github.com/Orpine), [Haozhi Qi](https://github.com/Oh233), [Guodong Zhang](https://github.com/gd-zhang), [Yi Li](https://github.com/liyi14), [Jifeng Dai](https://github.com/daijifeng001), [Bin Xiao](https://github.com/leoxiaobin) and  [Yichen Wei](https://github.com/YichenWei).
+The major contributors of this repository include [Yuwen Xiong](https://github.com/Orpine), [Haozhi Qi](https://github.com/Oh233), [Guodong Zhang](https://github.com/gd-zhang), [Yi Li](https://github.com/liyi14), [Jifeng Dai](https://github.com/daijifeng001), [Bin Xiao](https://github.com/leoxiaobin), [Han Hu](https://github.com/ancientmooner) and  [Yichen Wei](https://github.com/YichenWei).
 
-## Disclaimer
-
-This is an official implementation for [Deformable Convolutional Networks](https://arxiv.org/abs/1703.06211) (Deformable ConvNets). It is worth noticing that:
 
-  * The original implementation is based on our internal Caffe version on Windows. There are slight differences in the final accuracy and running time due to the plenty details in platform switch.
-  * The code is tested on official [MXNet@(commit 62ecb60)](https://github.com/dmlc/mxnet/tree/62ecb60) with the extra operators for Deformable ConvNets.
-  * We trained our model based on the ImageNet pre-trained [ResNet-v1-101](https://github.com/KaimingHe/deep-residual-networks) using a [model converter](https://github.com/dmlc/mxnet/tree/430ea7bfbbda67d993996d81c7fd44d3a20ef846/tools/caffe_converter). The converted model produces slightly lower accuracy (Top-1 Error on ImageNet val: 24.0% v.s. 23.6%).
-  * By now it only contains Deformable ConvNets with R-FCN. Deformable ConvNets with DeepLab will be released soon.
-  * This repository used code from [MXNet rcnn example](https://github.com/dmlc/mxnet/tree/master/example/rcnn) and [mx-rfcn](https://github.com/giorking/mx-rfcn).
 
 ## Introduction
 
-
 **Deformable ConvNets** is initially described in an [arxiv tech report](https://arxiv.org/abs/1703.06211).
 
 **R-FCN** is initially described in a [NIPS 2016 paper](https://arxiv.org/abs/1605.06409).
@@ -25,7 +16,15 @@ This is an official implementation for [Deformable Convolutional Networks](https
 <img src='demo/deformable_conv_demo2.png' width='800'>
 <img src='demo/deformable_psroipooling_demo.png' width='800'>
 
+## Disclaimer
+
+This is an official implementation for [Deformable Convolutional Networks](https://arxiv.org/abs/1703.06211) (Deformable ConvNets) based on MXNet. It is worth noticing that:
 
+  * The original implementation is based on our internal Caffe version on Windows. There are slight differences in the final accuracy and running time due to the plenty details in platform switch.
+  * The code is tested on official [MXNet@(commit 62ecb60)](https://github.com/dmlc/mxnet/tree/62ecb60) with the extra operators for Deformable ConvNets.
+  * We trained our model based on the ImageNet pre-trained [ResNet-v1-101](https://github.com/KaimingHe/deep-residual-networks) using a [model converter](https://github.com/dmlc/mxnet/tree/430ea7bfbbda67d993996d81c7fd44d3a20ef846/tools/caffe_converter). The converted model produces slightly lower accuracy (Top-1 Error on ImageNet val: 24.0% v.s. 23.6%).
+  * This repository used code from [MXNet rcnn example](https://github.com/dmlc/mxnet/tree/master/example/rcnn) and [mx-rfcn](https://github.com/giorking/mx-rfcn).
+
 ## License
 
 © Microsoft, 2017. Licensed under an Apache-2.0 license.
@@ -61,21 +60,34 @@ If you find Deformable ConvNets useful in your research, please consider citing:
 |---------------------------------|---------------|---------------|------|---------|---------|-------|-------|-------|
 | <sub>R-FCN, ResNet-v1-101 </sub>           | <sub>coco trainval</sub> | <sub>coco test-dev</sub> | 32.1 | 54.3    |   33.8  | 12.8  | 34.9  | 46.1  | 
 | <sub>Deformable R-FCN, ResNet-v1-101</sub> | <sub>coco trainval</sub> | <sub>coco test-dev</sub> | 35.7 | 56.8    | 38.3    | 15.2  | 38.8  | 51.5  |
+| <sub>Faster R-CNN (2fc), ResNet-v1-101 </sub>           | <sub>coco trainval</sub> | <sub>coco test-dev</sub> | 30.3 | 52.1    |   31.4  | 9.9  | 32.2  | 47.4  | 
+| <sub>Deformable Faster R-CNN (2fc), </br>ResNet-v1-101</sub> | <sub>coco trainval</sub> | <sub>coco test-dev</sub> | 35.0 | 55.0    | 38.3    | 14.3  | 37.7  | 52.0  |
+
+
+
+|                                   | training data              | testing data   | mIoU | time  |
+|-----------------------------------|----------------------------|----------------|------|-------|
+| DeepLab, ResNet-v1-101            | Cityscapes train           | Cityscapes val | 70.3 | 0.51s |
+| Deformable DeepLab, ResNet-v1-101 | Cityscapes train           | Cityscapes val | 75.2 | 0.52s |
+| DeepLab, ResNet-v1-101            | VOC 12 train (augmented) | VOC 12 val   | 70.7 | 0.08s |
+| Deformable DeepLab, ResNet-v1-101 | VOC 12 train (augmented) | VOC 12 val   | 75.9 | 0.08s |
 
 
 *Running time is counted on a single Maxwell Titan X GPU (mini-batch size is 1 in inference).*
 
 ## Requirements: Software
 
-1. MXNet from [offical repository](https://github.com/dmlc/mxnet). We tested our code on [MXNet@(commit 62ecb60)](https://github.com/dmlc/mxnet/tree/62ecb60). Due to the rapid development of MXNet, it is recommended to checkout this version if you have any problems. We may maintain this repository periodically if MXNet adds important feature in future release.
+1. MXNet from [the offical repository](https://github.com/dmlc/mxnet). We tested our code on [MXNet@(commit 62ecb60)](https://github.com/dmlc/mxnet/tree/62ecb60). Due to the rapid development of MXNet, it is recommended to checkout this version if you encounter any issues. We may maintain this repository periodically if MXNet adds important feature in future release.
+
+2. Python 2.7. We recommend using Anaconda2
 
-2. Python packages might missing: cython, opencv-python >= 3.2.0, easydict. If `pip` is set up on your system, those packages should be able to be fetched and installed by running
+3. Python packages might missing: cython, opencv-python >= 3.2.0, easydict. If `pip` is set up on your system, those packages should be able to be fetched and installed by running
 	```
 	pip install Cython
 	pip install opencv-python==3.2.0.6
 	pip install easydict==1.6
 	```
-3. For Windows users, Visual Studio 2015 is needed to compile cython module.
+4. For Windows users, Visual Studio 2015 is needed to compile cython module.
 
 
 ## Requirements: Hardware
@@ -91,34 +103,52 @@ git clone https://github.com/msracver/Deformable-ConvNets.git
 2. For Windows users, run ``cmd .\init.bat``. For Linux user, run `sh ./init.sh`. The scripts will build cython module automatically and create some folders.
 3. Copy operators in `./rfcn/operator_cxx` to `$(YOUR_MXNET_FOLDER)/src/operator/contrib` and recompile MXNet.
 4. Please install MXNet following the official guide of MXNet. For advanced users, you may put your Python packge into `./external/mxnet/$(YOUR_MXNET_PACKAGE)`, and modify `MXNET_VERSION` in `./experiments/rfcn/cfgs/*.yaml` to `$(YOUR_MXNET_PACKAGE)`. Thus you can switch among different versions of MXNet quickly.
+5. For Deeplab, we use the argumented VOC 2012 dataset. The argumented annotations are provided by [SBD](http://home.bharathh.info/pubs/codes/SBD/download.html) dataset. For convenience, we provide the converted PNG annotations and the lists of train/val images, please download them from [OneDrive](https://1drv.ms/u/s!Am-5JzdW2XHzhqMRhVImMI1jRrsxDg).
 
+## Demo & Deformable Model
 
-## Demo
+We provide trained deformable convnet models, including the deformable R-FCN & Faster R-CNN models trained on COCO trainval, and the deformable DeepLab model trained on CityScapes train.
 
-1. To use the demo with our trained model (on COCO trainval), please download the model manually from [OneDrive](https://1drv.ms/u/s!AoN7vygOjLIQqmE7XqFVLbeZDfVN), and put it under folder `model/`.
+1. To use the demo with our pre-trained deformable models, please download manually from [OneDrive](https://1drv.ms/u/s!Am-5JzdW2XHzhqMSjehIcCgAhvEAHw), and put it under folder `model/`.
 
 	Make sure it looks like this:
 	```
 	./model/rfcn_dcn_coco-0000.params
 	./model/rfcn_coco-0000.params
+	./model/rcnn_dcn_coco-0000.params
+	./model/rcnn_coco-0000.params
+	./model/deeplab_dcn_cityscapes-0000.params
+	./model/deeplab_cityscapes-0000.params
+	./model/deform_conv-0000.params
+	./model/deform_psroi-0000.params
 	```
-2. To run the demo, run
+2. To run the R-FCN demo, run
 	```
 	python ./rfcn/demo.py
 	```
 	By default it will run Deformable R-FCN and gives several prediction results, to run R-FCN, use
 	```
 	python ./rfcn/demo.py --rfcn_only
 	```
-
-
-
-We will release the visualizaiton tool which visualizes the deformation effects soon.
+3. To run the DeepLab demo, run
+	```
+	python ./deeplab/demo.py
+	```
+	By default it will run Deformable Deeplab and gives several prediction results, to run DeepLab, use
+	```
+	python ./deeplab/demo.py --deeplab_only
+	```
+4. To visualize the offset of deformable convolution and deformable psroipooling, run
+	```
+	python ./rfcn/deform_conv_demo.py
+	python ./rfcn/defrom_psroi_demo.py
+	```
 
 
 ## Preparation for Training & Testing
 
-1. Please download COCO and VOC 2007+2012 dataset, and make sure it looks like this:
+For R-FCN/Faster R-CNN\:
+1. Please download COCO and VOC 2007+2012 datasets, and make sure it looks like this:
 
 	```
 	./data/coco/
@@ -131,10 +161,30 @@ We will release the visualizaiton tool which visualizes the deformation effects
 	./model/pretrained_model/resnet_v1_101-0000.params
 	```
 
+For DeepLab\:
+1. Please download Cityscapes and VOC 2012 datasets and make sure it looks like this:
+
+	```
+	./data/cityscapes/
+	./data/VOCdevkit/VOC2012/
+	```
+2. Please download argumented VOC 2012 annotations/image lists, and put the argumented annotations and the argumented train/val lists into:
+
+	```
+	./data/VOCdevkit/VOC2012/SegmentationClass/
+	./data/VOCdevkit/VOC2012/ImageSets/Main/
+	```
+   , Respectively.
+
+2. Please download ImageNet-pretrained ResNet-v1-101 model manually from [OneDrive](https://1drv.ms/u/s!Am-5JzdW2XHzhqMEtxf1Ciym8uZ8sg), and put it under folder `./model`. Make sure it looks like this:
+	```
+	./model/pretrained_model/resnet_v1_101-0000.params
+	```
 ## Usage
 
-1. All of our experiment settings (GPU #, dataset, etc.) are kept in yaml files at folder `./experiments/rfcn/cfgs`.
-2. Four config files have been provided so far, namely, R-FCN for COCO/VOC and Deformable R-FCN for COCO/VOC, respectively. We use 8 and 4 GPUs to train models on COCO and on VOC, respectively.
+1. All of our experiment settings (GPU #, dataset, etc.) are kept in yaml config files at folder `./experiments/rfcn/cfgs`, `./experiments/faster_rcnn/cfgs` and `./experiments/deeplab/cfgs/`.
+2. Eight config files have been provided so far, namely, R-FCN for COCO/VOC, Deformable R-FCN for COCO/VOC, Faster R-CNN(2fc) for COCO/VOC, Deformable Faster R-CNN(2fc) for COCO/VOC, Deeplab for Cityscapes/VOC and Deformable Deeplab for Cityscapes/VOC, respectively. We use 8 and 4 GPUs to train models on COCO and on VOC for R-FCN, respectively. For deeplab, we use 4 GPUs for all experiments.
+
 3. To perform experiments, run the python scripts with the corresponding config file as input. For example, to train and test deformable convnets on COCO with ResNet-v1-101, use the following command
     ```
     python experiments\rfcn\rfcn_end2end_train_test.py --cfg experiments\rfcn\cfgs\resnet_v1_101_coco_trainval_rfcn_dcn_end2end_ohem.yaml
@@ -144,11 +194,35 @@ We will release the visualizaiton tool which visualizes the deformation effects
 
 ## Misc.
 
-MXNet build without CuDNN is recommended.
-
 Code has been tested under:
 
 - Ubuntu 14.04 with a Maxwell Titan X GPU and Intel Xeon CPU E5-2620 v2 @ 2.10GHz
 - Windows Server 2012 R2 with 8 K40 GPUs and Intel Xeon CPU E5-2650 v2 @ 2.60GHz
 - Windows Server 2012 R2 with 4 Pascal Titan X GPUs and Intel Xeon CPU E5-2650 v4 @ 2.30GHz
 
+## FAQ
+
+Q: It says `AttributeError: 'module' object has no attribute 'DeformableConvolution'`.
+
+A: This is because either
+ - you forget to copy the operators to your MXNet folder
+ - or you copy to the wrong path
+ - or you forget to re-compile
+ - or you install the wrong MXNet
+
+    Please print `mxnet.__path__` to make sure you use correct MXNet
+
+<br/><br/>
+Q: I encounter `segment fault` at the beginning.
+
+A: A compatibility issue has been identified between MXNet and opencv-python 3.0+. We suggest that you always `import cv2` first before `import mxnet` in the entry script. 
+
+<br/><br/>
+Q: I find the training speed becomes slower when training for a long time.
+
+A: It has been identified that MXNet on Windows has this problem. So we recommend to run this program on Linux. You could also stop it and resume the training process to regain the training speed if you encounter this problem.
+
+<br/><br/>
+Q: Can you share your caffe implementation?
+
+A: Due to several reasons (code is based on a old, internal Caffe, port to public Caffe needs extra work, time limit, etc.). We do not plan to release our Caffe code. Since current MXNet convolution implementation is very similar to Caffe (almost the same), it is easy to port to Caffe by yourself, the core CUDA code could be kept unchanged. Anyone who wish to do it is welcome to make a pull request.
diff --git a/deeplab/_init_paths.py b/deeplab/_init_paths.py
@@ -0,0 +1,19 @@
+# --------------------------------------------------------
+# Deformable Convolutional Networks
+# Copyright (c) 2016 by Contributors
+# Copyright (c) 2017 Microsoft
+# Licensed under The Apache-2.0 License [see LICENSE for details]
+# Modified by Zheng Zhang
+# --------------------------------------------------------
+
+import os.path as osp
+import sys
+
+def add_path(path):
+    if path not in sys.path:
+        sys.path.insert(0, path)
+
+this_dir = osp.dirname(__file__)
+
+lib_path = osp.join(this_dir, '..', 'lib')
+add_path(lib_path)
diff --git a/deeplab/config/__init__.py b/deeplab/config/__init__.py
diff --git a/deeplab/config/config.py b/deeplab/config/config.py
@@ -0,0 +1,96 @@
+# --------------------------------------------------------
+# Deformable Convolutional Networks
+# Copyright (c) 2016 by Contributors
+# Copyright (c) 2017 Microsoft
+# Licensed under The Apache-2.0 License [see LICENSE for details]
+# Modified by Zheng Zhang
+# --------------------------------------------------------
+
+import yaml
+import numpy as np
+from easydict import EasyDict as edict
+
+config = edict()
+
+config.MXNET_VERSION = ''
+config.output_path = ''
+config.symbol = ''
+config.gpus = ''
+config.CLASS_AGNOSTIC = True
+config.SCALES = [(360, 600)]  # first is scale (the shorter side); second is max size
+
+# default training
+config.default = edict()
+config.default.frequent = 1000
+config.default.kvstore = 'device'
+
+# network related params
+config.network = edict()
+config.network.pretrained = '../model/pretrained_model/resnet_v1-101'
+config.network.pretrained_epoch = 0
+config.network.PIXEL_MEANS = np.array([103.06, 115.90, 123.15])
+config.network.IMAGE_STRIDE = 0
+config.network.FIXED_PARAMS = ['conv1', 'bn_conv1', 'res2', 'bn2', 'gamma', 'beta']
+
+# dataset related params
+config.dataset = edict()
+config.dataset.dataset = 'cityscapes'
+config.dataset.image_set = 'leftImg8bit_train'
+config.dataset.test_image_set = 'leftImg8bit_val'
+config.dataset.root_path = '../data'
+config.dataset.dataset_path = '../data/cityscapes'
+config.dataset.NUM_CLASSES = 19
+config.dataset.annotation_prefix = 'gtFine'
+
+config.TRAIN = edict()
+config.TRAIN.lr = 0
+config.TRAIN.lr_step = ''
+config.TRAIN.warmup = False
+config.TRAIN.warmup_lr = 0
+config.TRAIN.warmup_step = 0
+config.TRAIN.momentum = 0.9
+config.TRAIN.wd = 0.0005
+config.TRAIN.begin_epoch = 0
+config.TRAIN.end_epoch = 0
+config.TRAIN.model_prefix = 'deeplab'
+
+# whether resume training
+config.TRAIN.RESUME = False
+# whether flip image
+config.TRAIN.FLIP = True
+# whether shuffle image
+config.TRAIN.SHUFFLE = True
+# whether use OHEM
+config.TRAIN.ENABLE_OHEM = False
+# size of images for each device, 2 for rcnn, 1 for rpn and e2e
+config.TRAIN.BATCH_IMAGES = 1
+
+config.TEST = edict()
+# size of images for each device
+config.TEST.BATCH_IMAGES = 1
+
+# Test Model Epoch
+config.TEST.test_epoch = 0
+
+def update_config(config_file):
+    exp_config = None
+    with open(config_file) as f:
+        exp_config = edict(yaml.load(f))
+        for k, v in exp_config.items():
+            if k in config:
+                if isinstance(v, dict):
+                    if k == 'TRAIN':
+                        if 'BBOX_WEIGHTS' in v:
+                            v['BBOX_WEIGHTS'] = np.array(v['BBOX_WEIGHTS'])
+                    elif k == 'network':
+                        if 'PIXEL_MEANS' in v:
+                            v['PIXEL_MEANS'] = np.array(v['PIXEL_MEANS'])
+                    for vk, vv in v.items():
+                        config[k][vk] = vv
+                else:
+                    if k == 'SCALES':
+                        config[k][0] = (tuple(v))
+                    else:
+                        config[k] = v
+            else:
+                raise ValueError("key must exist in config.py")