Skip to content
This repository has been archived by the owner on Apr 23, 2024. It is now read-only.

【Not Merge】support resnet50 train cifar10 #1

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 5 additions & 53 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Train CIFAR10 with OneFlow
# Train CIFAR10 with OneFlow Cambricon in MLU270

I'm playing with [OneFlow](https://github.com/Oneflow-Inc/oneflow) on the CIFAR10 dataset.
I'm playing with [oneflow-cambricon](https://github.com/Oneflow-Inc/oneflow-cambricon) on the CIFAR10 dataset.

## Prerequisites
- Python 3.6+
- OneFlow 0.5.0rc+
- Python 3.7+
- OneFlow cambricon

## Training
```
Expand All @@ -18,55 +18,7 @@ python main.py --resume --lr=0.01
## Accuracy
| Model | Acc. |
| ----------------- | ----------- |
| [VGG16](https://arxiv.org/abs/1409.1556) | 93.92%|
| [ResNet18](https://arxiv.org/abs/1512.03385) | 95.62%|
| [ResNet50](https://arxiv.org/abs/1512.03385) | 95.40% |
| [ResNet101](https://arxiv.org/abs/1512.03385) | |
| [RegNetX_200MF](https://arxiv.org/abs/2003.13678) | 95.10%|
| [RegNetY_400MF](https://arxiv.org/abs/2003.13678) | |
| [MobileNetV2](https://arxiv.org/abs/1801.04381) | 92.56%|
| [ResNeXt29(32x4d)](https://arxiv.org/abs/1611.05431) | |
| [ResNeXt29(2x64d)](https://arxiv.org/abs/1611.05431) | |
| [SimpleDLA](https://arxiv.org/abs/1707.064) | |
| [DenseNet121](https://arxiv.org/abs/1608.06993) | |
| [PreActResNet18](https://arxiv.org/abs/1603.05027) | |
| [DPN92](https://arxiv.org/abs/1707.01629) | |
| [DLA](https://arxiv.org/pdf/1707.06484.pdf) | |

## Quantization Aware Training

If you are interested in OneFlow FX feature, please do the following to compile OneFlow Experience FX.

```
git clone https://github.com/Oneflow-Inc/oneflow
cd oneflow
git checkout add_fx_intermediate_representation
mkdir build
cd build
cmake -DCUDNN_ROOT_DIR=/usr/local/cudnn -DCMAKE_BUILD_TYPE=Release -DTHIRD_PARTY_MIRROR=aliyun -DUSE_CLANG_FORMAT=ON -DTREAT_WARNINGS_AS_ERRORS=OFF ..
make -j32
```

```
# Start training with:
python main_qat.py

# You can manually resume the training with:
python main_qat.py --resume --lr=0.01
```

Note:

The `momentum` parameter in the `MovingAverageMinMaxObserver` class defaults to 0.95, which will not be changed in the following experiments.
## Accuracy
| Model | quantization_bit | quantization_scheme | quantization_formula | per_layer_quantization | Acc |
| ----------------- | ----------- | ----------- | ----------- | ----------- | ----------- |
| ResNet18 | 8 | symmetric | google | True | 95.19% |
| ResNet18 | 8 | symmetric | google | False | 95.24% |
| ResNet18 | 8 | affine | google | True | 95.32% |
| ResNet18 | 8 | affine | google | False | 95.30% |
| ResNet18 | 8 | symmetric | cambricon | True | 95.19% |

| [ResNet50](https://arxiv.org/abs/1512.03385) | |
## Reference
- https://github.com/kuangliu/pytorch-cifar

Expand Down
26 changes: 12 additions & 14 deletions main.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,8 @@
import oneflow.nn as nn
import oneflow.optim as optim
import oneflow.nn.functional as F
import oneflow.backends.cudnn as cudnn

import torch
import torchvision
import flowvision


import os
Expand All @@ -22,7 +20,7 @@
help='resume from checkpoint')
args = parser.parse_args()

device = 'cuda' if flow.cuda.is_available() else 'cpu'
device = 'mlu'
best_acc = 0 # best test accuracy
start_epoch = 0 # start from epoch 0 or last checkpoint epoch

Expand Down Expand Up @@ -54,7 +52,7 @@
# testset, batch_size=100, shuffle=False, num_workers=2)

# PyTorch DataReader
import torchvision.transforms as transforms
import flowvision.transforms as transforms
transform_train = transforms.Compose([
transforms.RandomCrop(32, padding=4),
transforms.RandomHorizontalFlip(),
Expand All @@ -67,14 +65,14 @@
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

trainset = torchvision.datasets.CIFAR10(
trainset = flowvision.datasets.CIFAR10(
root='./data', train=True, download=True, transform=transform_train)
trainloader = torch.utils.data.DataLoader(
trainloader = flow.utils.data.DataLoader(
trainset, batch_size=128, shuffle=True, num_workers=2, drop_last=True)

testset = torchvision.datasets.CIFAR10(
testset = flowvision.datasets.CIFAR10(
root='./data', train=False, download=True, transform=transform_test)
testloader = torch.utils.data.DataLoader(
testloader = flow.utils.data.DataLoader(
testset, batch_size=100, shuffle=False, num_workers=2, drop_last=True)


Expand All @@ -84,8 +82,8 @@
# Model
print('==> Building model..')
# net = VGG('VGG16')
net = ResNet18()
# net = ResNet50()
# net = ResNet18()
net = ResNet50()
# net = PreActResNet18()
# net = GoogLeNet()
# net = DenseNet121()
Expand Down Expand Up @@ -145,8 +143,8 @@ def train(epoch):
# _, predicted = outputs.max(1)
predicted = flow.argmax(outputs, 1).to(flow.int64)
total += targets.size(0)

correct += predicted.eq(targets).to(flow.int32).sum().item()
tmp = predicted.eq(targets).to(flow.int)
correct += tmp.sum().item()

progress_bar(batch_idx, len(trainloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
% (train_loss/(batch_idx+1), 100.*correct/total, correct, total))
Expand Down Expand Up @@ -175,7 +173,7 @@ def test(epoch):
total += targets.size(0)


correct += predicted.eq(targets).to(flow.int32).sum().item()
correct += predicted.eq(targets).to(flow.int).sum().item()

progress_bar(batch_idx, len(testloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
% (test_loss/(batch_idx+1), 100.*correct/total, correct, total))
Expand Down
208 changes: 0 additions & 208 deletions main_amp.py

This file was deleted.

Loading