-
Notifications
You must be signed in to change notification settings - Fork 145
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'mindspore-lab:main' into main
- Loading branch information
Showing
28 changed files
with
3,413 additions
and
150 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
# HaloNet | ||
|
||
> [Scaling Local Self-Attention for Parameter Efficient Visual Backbones](https://arxiv.org/abs/2103.12731) | ||
## Introduction | ||
|
||
Researchers from Google Research and UC Berkeley have developed a new model of self-attention that can outperform standard baseline models and even high-performance convolutional models.[[1](#references)] | ||
|
||
Blocked Self-Attention:The whole input image is divided into multiple blocks and self-attention is applied to each block.However, if only the information inside the block is considered each time, it will inevitably lead to the loss of information.Therefore, before calculating the SA, a haloing operation is performed on each block, i.e., outside of each block, the information of the original image is used to padding a circle, so that the sensory field of each block can be appropriately larger and focus on more information. | ||
|
||
<p align="center"> | ||
<img src="https://github-production-user-asset-6210df.s3.amazonaws.com/50255437/257577202-3ac43b82-785a-42c5-9b6c-ca58b0fa7ab8.png" width=800 /> | ||
</p> | ||
<p align="center"> | ||
<em>Figure 1. Architecture of Blocked Self-Attention [<a href="#references">1</a>] </em> | ||
</p> | ||
|
||
Down Sampling:In order to reduce the amount of computation, each block is sampled separately, and then attentions are performed on this sampled information to reach the effect of down sampling. | ||
|
||
<p align="center"> | ||
<img src="https://github-production-user-asset-6210df.s3.amazonaws.com/50255437/257578183-fe45c2c2-5006-492b-b30a-5b049a0e2531.png" width=800 /> | ||
</p> | ||
<p align="center"> | ||
<em>Figure 2. Architecture of Down Sampling [<a href="#references">1</a>] </em> | ||
</p> | ||
|
||
|
||
## Results | ||
|
||
Our reproduced model performance on ImageNet-1K is reported as follows. | ||
|
||
<div align="center"> | ||
|
||
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | | ||
| ----------- | -------- | --------- | --------- | ---------- | ------------------------------------------------------------ | ------------------------------------------------------------ | | ||
| halonet_50t | D910X8-G | 79.53 | 94.79 | 22.79 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/halonet/halonet_50t_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/halonet/halonet_50t-533da6be.ckpt) | | ||
|
||
</div> | ||
|
||
#### Notes | ||
|
||
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. | ||
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. | ||
|
||
## Quick Start | ||
|
||
### Preparation | ||
|
||
#### Installation | ||
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV. | ||
|
||
#### Dataset Preparation | ||
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. | ||
|
||
### Training | ||
|
||
* Distributed Training | ||
|
||
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run | ||
|
||
```shell | ||
# distributed training on multiple GPU/Ascend devices | ||
mpirun -n 8 python train.py --config configs/halonet/halonet_50t_ascend.yaml --data_dir /path/to/imagenet | ||
``` | ||
|
||
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. | ||
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. | ||
|
||
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). | ||
|
||
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. | ||
|
||
* Standalone Training | ||
|
||
If you want to train or finetune the model on a smaller dataset without distributed training, please run: | ||
|
||
```shell | ||
# standalone training on a CPU/GPU/Ascend device | ||
python train.py --config configs/halonet/halonet_50t_ascend.yaml --data_dir /path/to/dataset --distribute False | ||
``` | ||
|
||
### Validation | ||
|
||
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. | ||
|
||
```shell | ||
python validate.py -c configs/halonet/halonet_50t_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt | ||
``` | ||
|
||
### Deployment | ||
|
||
Please refer to the [deployment tutorial](https://mindspore-lab.github.io/mindcv/tutorials/deployment/) in MindCV. | ||
|
||
## References | ||
|
||
[1] Vaswani A, Ramachandran P, Srinivas A, et al. Scaling local self-attention for parameter efficient visual backbones[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 12894-12904. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
# system | ||
mode: 0 | ||
distribute: True | ||
num_parallel_workers: 8 | ||
val_while_train: True | ||
|
||
# dataset | ||
dataset: 'imagenet' | ||
data_dir: '/path/to/imagenet' | ||
shuffle: True | ||
dataset_download: False | ||
batch_size: 64 | ||
drop_remainder: True | ||
val_split: val | ||
|
||
# augmentation | ||
image_resize: 256 | ||
scale: [0.08, 1.0] | ||
ratio: [0.75, 1.333] | ||
hflip: 0.5 | ||
interpolation: 'bilinear' | ||
crop_pct: 0.95 | ||
|
||
#color_jitter: | ||
auto_augment: 'randaug-m9-n2-mstd0.5-inc1' | ||
re_prob: 0.25 | ||
re_max_attempts: 1 | ||
mixup: 0.8 | ||
color_jitter: 0.4 | ||
|
||
# model | ||
model: 'halonet_50t' | ||
num_classes: 1000 | ||
pretrained: False | ||
ckpt_path: '' | ||
keep_checkpoint_max: 20 | ||
val_interval: 5 | ||
ckpt_save_dir: './ckpt' | ||
epoch_size: 300 | ||
dataset_sink_mode: True | ||
amp_level: 'O3' | ||
val_amp_level: 'O2' | ||
|
||
# optimizer | ||
opt: 'adamw' | ||
filter_bias_and_bn: True | ||
weight_decay: 0.04 | ||
loss_scale: 1024 | ||
use_nesterov: False | ||
|
||
# lr scheduler | ||
scheduler: 'warmup_cosine_decay' | ||
min_lr: 0.000006 | ||
lr: 0.00125 | ||
warmup_epochs: 3 | ||
decay_epochs: 297 | ||
|
||
# loss | ||
loss: 'CE' | ||
label_smoothing: 0.1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,143 @@ | ||
# SSD Based on MindCV Backbones | ||
|
||
> [SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325) | ||
## Introduction | ||
|
||
SSD is an single-staged object detector. It discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, and combines predictions from multi-scale feature maps to detect objects with various sizes. At prediction time, SSD generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. | ||
|
||
<p align="center"> | ||
<img src="https://github.com/DexterJZ/mindcv/assets/16130861/50bc9627-c71c-4b1a-9de4-9e6040a43279" width=800 /> | ||
</p> | ||
<p align="center"> | ||
<em>Figure 1. Architecture of SSD [<a href="#references">1</a>] </em> | ||
</p> | ||
|
||
In this example, by leveraging [the multi-scale feature extraction of MindCV](https://github.com/mindspore-lab/mindcv/blob/main/docs/en/how_to_guides/feature_extraction.md), we demonstrate that using backbones from MindCV much simplifies the implementation of SSD. | ||
|
||
## Configurations | ||
|
||
Here, we provide three configurations of SSD. | ||
* Using [MobileNetV2](https://github.com/mindspore-lab/mindcv/tree/main/configs/mobilenetv2) as the backbone and the original detector described in the paper. | ||
* Using [ResNet50](https://github.com/mindspore-lab/mindcv/tree/main/configs/resnet) as the backbone with a FPN and a shared-weight-based detector. | ||
* Using [MobileNetV3](https://github.com/mindspore-lab/mindcv/tree/main/configs/mobilenetv3) as the backbone and the original detector described in the paper. | ||
|
||
## Dataset | ||
|
||
We train and test SSD using [COCO 2017 Dataset](https://cocodataset.org/#download). The dataset contains | ||
* 118000 images about 18 GB for training, and | ||
* 5000 images about 1 GB for testing. | ||
|
||
## Quick Start | ||
|
||
### Preparation | ||
|
||
1. Clone MindCV repository by running | ||
``` | ||
git clone https://github.com/mindspore-lab/mindcv.git | ||
``` | ||
|
||
2. Install dependencies as shown [here](https://mindspore-lab.github.io/mindcv/installation/). | ||
|
||
3. Download [COCO 2017 Dataset](https://cocodataset.org/#download), prepare the dataset as follows. | ||
``` | ||
. | ||
└─cocodataset | ||
├─annotations | ||
├─instance_train2017.json | ||
└─instance_val2017.json | ||
├─val2017 | ||
└─train2017 | ||
``` | ||
Run the following commands to preprocess the dataset and convert it to [MindRecord format](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore.mindrecord.html) for reducing preprocessing time during training and testing. | ||
``` | ||
cd mindcv # change directory to the root of MindCV repository | ||
python examples/det/ssd/create_data.py coco --data_path [root of COCO 2017 Dataset] --out_path [directory for storing MindRecord files] | ||
``` | ||
Specify the path of the preprocessed dataset at keyword `data_dir` in the config file. | ||
|
||
4. Download the pretrained backbone weights from the table below, and specify the path to the backbone weights at keyword `backbone_ckpt_path` in the config file. | ||
<div align="center"> | ||
|
||
| MobileNetV2 | ResNet50 | MobileNetV3 | | ||
|:----------------:|:----------------:|:----------------:| | ||
| [backbone weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv2/mobilenet_v2_100-d5532038.ckpt) | [backbone weights](https://download.mindspore.cn/toolkits/mindcv/resnet/resnet50-e0733ab8.ckpt) | [backbone weights](https://download.mindspore.cn/toolkits/mindcv/mobilenet/mobilenetv3/mobilenet_v3_large_100-1279ad5f.ckpt) | | ||
|
||
</div> | ||
|
||
### Train | ||
|
||
It is highly recommended to use **distributed training** for this SSD implementation. | ||
|
||
For distributed training using **OpenMPI's `mpirun`**, simply run | ||
``` | ||
cd mindcv # change directory to the root of MindCV repository | ||
mpirun -n [# of devices] python examples/det/ssd/train.py --config [the path to the config file] | ||
``` | ||
For example, if train SSD distributively with the `MobileNetV2` configuration on 8 devices, run | ||
``` | ||
cd mindcv # change directory to the root of MindCV repository | ||
mpirun -n 8 python examples/det/ssd/train.py --config examples/det/ssd/ssd_mobilenetv2.yaml | ||
``` | ||
|
||
For distributed training with [Ascend rank table](https://github.com/mindspore-lab/mindocr/blob/main/docs/en/tutorials/distribute_train.md#12-configure-rank_table_file-for-training), configure `ascend8p.sh` as follows | ||
``` | ||
#!/bin/bash | ||
export DEVICE_NUM=8 | ||
export RANK_SIZE=8 | ||
export RANK_TABLE_FILE="./hccl_8p_01234567_127.0.0.1.json" | ||
for ((i = 0; i < ${DEVICE_NUM}; i++)); do | ||
export DEVICE_ID=$i | ||
export RANK_ID=$i | ||
echo "Launching rank: ${RANK_ID}, device: ${DEVICE_ID}" | ||
if [ $i -eq 0 ]; then | ||
echo 'i am 0' | ||
python examples/det/ssd/train.py --config [the path to the config file] &> ./train.log & | ||
else | ||
echo 'not 0' | ||
python -u examples/det/ssd/train.py --config [the path to the config file] &> /dev/null & | ||
fi | ||
done | ||
``` | ||
and start training by running | ||
``` | ||
cd mindcv # change directory to the root of MindCV repository | ||
bash ascend8p.sh | ||
``` | ||
|
||
For single-device training, please run | ||
``` | ||
cd mindcv # change directory to the root of MindCV repository | ||
python examples/det/ssd/train.py --config [the path to the config file] | ||
``` | ||
|
||
### Test | ||
|
||
For testing the trained model, first specify the path to the model checkpoint at keyword `ckpt_path` in the config file, then run | ||
``` | ||
cd mindcv # change directory to the root of MindCV repository | ||
python examples/det/ssd/eval.py --config [the path to the config file] | ||
``` | ||
For example, for testing SSD with the `MobileNetV2` configuration, run | ||
``` | ||
cd mindcv # change directory to the root of MindCV repository | ||
python examples/det/ssd/eval.py --config examples/det/ssd/ssd_mobilenetv2.yaml | ||
``` | ||
|
||
## Performance | ||
|
||
Here are the performance resutls and the pretrained model weights for each configuration. | ||
<div align="center"> | ||
|
||
| Configuration | Mixed Precision | mAP | Config | Download | | ||
|:-----------------:|:---------------:|:----:|:------:|:--------:| | ||
| MobileNetV2 | O2 | 23.2 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/examples/det/ssd/ssd_mobilenetv2.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/ssd/ssd_mobilenetv2-5bbd7411.ckpt) | | ||
| ResNet50 with FPN | O3 | 38.3 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/examples/det/ssd/ssd_resnet50_fpn.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/ssd/ssd_resnet50_fpn-ac87ddac.ckpt) | | ||
| MobileNetV3 | O2 | 23.8 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/examples/det/ssd/ssd_mobilenetv3.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/ssd/ssd_mobilenetv3-53d9f6e9.ckpt) | | ||
|
||
</div> | ||
|
||
## References | ||
|
||
[1] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). SSD: Single Shot Multibox Detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14 (pp. 21-37). Springer International Publishing. |
Oops, something went wrong.