Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bert_perf_investigation #147

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,15 +49,19 @@ jobs:
MILABENCH_DASH: "no"

steps:
- uses: actions/checkout@v3

- uses: conda-incubator/setup-miniconda@v2
with:
auto-activate-base: false
python-version: 3.9
miniconda-version: "latest"
activate-environment: test

# - name: clean
# run: |
# python -c "import shutil; shutil.rmtree('/opt/actions-runner/_work/milabench/milabench')"

- uses: actions/checkout@v3

- name: Pytorch Sanity
run: |
if [[ "${{ matrix.arch }}" == "rocm" ]]; then
Expand Down
7 changes: 7 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[submodule "benchmarks/mlperf/apex"]
path = benchmarks/mlperf/apex
url = https://github.com/NVIDIA/apex.git

[submodule "benchmarks/mlperf/training_results_v2.1"]
path = benchmarks/mlperf/training_results_v2.1
url = https://github.com/mlcommons/training_results_v2.1.git
34 changes: 28 additions & 6 deletions benchmarks/huggingface/bench/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
from .synth import SyntheticData, generators



def is_tf32_allowed(args):
return "tf32" in args.precision

Expand All @@ -20,6 +21,16 @@ def is_fp16_allowed(args):
return "fp16" in args.precision


class ModelWrapper(torch.nn.Module):
def __init__(self, model):
super().__init__()
self.model = model

def forward(self, x):
out = self.model(input_ids=x['input_ids'], labels=x['labels'])
return out['loss'], out['logits']


class Runner:
def __init__(self, args):
use_cuda = not args.no_cuda and torch.cuda.is_available()
Expand All @@ -32,17 +43,30 @@ def __init__(self, args):
self.device = torch.device("cuda" if use_cuda else "cpu")
self.batch_size = args.batch_size
info = models[args.model]()
self.model = info.model.to(self.device)
self.optimizer = optim.Adam(self.model.parameters(), lr=args.lr)



self.data = SyntheticData(
n=args.batch_size,
repeat=100000,
generators=generators[info.category](info),
)

self.loader = DataLoader(
self.data, batch_size=args.batch_size, num_workers=args.num_workers
)

example = next(iter(self.loader))
example = {k: x.to(self.device) for k, x in example.items()}

model = ModelWrapper(info.model).to(self.device)

jit = False
if jit:
model = torch.jit.trace(model, example)

self.model = model
self.optimizer = optim.Adam(self.model.parameters(), lr=args.lr)


self.amp_scaler = torch.cuda.amp.GradScaler(enabled=is_fp16_allowed(args))
if is_fp16_allowed(args):
Expand All @@ -52,9 +76,7 @@ def __init__(self, args):

def step(self, data):
with self.amp_context():
outputs = self.model(**data)

loss = outputs.loss
loss, _ = self.model(data)

self.amp_scaler.scale(loss).backward()
self.amp_scaler.step(self.optimizer)
Expand Down
1 change: 1 addition & 0 deletions benchmarks/mlperf/apex
Submodule apex added at 05091d
10 changes: 10 additions & 0 deletions benchmarks/mlperf/benchfile.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
from milabench.pack import Package


class MLPerfBenchmark(Package):
base_requirements = "requirements.in"
main_script = "main.py"


__pack__ = MLPerfBenchmark

13 changes: 13 additions & 0 deletions benchmarks/mlperf/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@

import sys
import os


FOLDER = os.path.dirname(__file__)
BENCH = "training_results_v2.1/NVIDIA/benchmarks/bert/implementations/pytorch-preview"

print(sys.path)
sys.path.append(os.path.join(FOLDER, BENCH))
print(sys.path)

import run_squad
4 changes: 4 additions & 0 deletions benchmarks/mlperf/requirements.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
git+https://github.com/NVIDIA/mlperf-common.git
git+https://github.com/NVIDIA/apex.git
git+https://github.com/mlcommons/logging.git
boto3
1 change: 1 addition & 0 deletions benchmarks/mlperf/training_results_v2.1
Submodule training_results_v2.1 added at 158189
16 changes: 14 additions & 2 deletions config/base.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,16 @@ _defaults:
gpu_load_threshold: 0.5
gpu_mem_threshold: 0.5


mlperf:
inherits: _defaults
definition: ../benchmarks/mlperf
group: mlperf
install_group: torch
plan:
method: per_gpu


_torchvision:
inherits: _defaults
definition: ../benchmarks/torchvision
Expand Down Expand Up @@ -92,7 +102,9 @@ resnet50:

argv:
--model: resnet50
--batch-size: 64
--batch-size: 256
--synthetic-data: true
--precision: 'fp16'

efficientnet_b4:
inherits: _torchvision
Expand Down Expand Up @@ -172,7 +184,7 @@ _bert-base:
- precision-showcase
argv:
--model: "Bert"
--batch-size: 32
--batch-size: 48
voir:
options:
stop: 30
Expand Down