forked from tinygrad/tinygrad
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
update mlperf systems and copy 4.1 to 5.0 (tinygrad#7004)
- Loading branch information
Showing
22 changed files
with
543 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
73 changes: 73 additions & 0 deletions
73
...ubmission_v5.0/tinycorp/benchmarks/bert/implementations/tinybox_green/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
# 1. Problem | ||
|
||
This problem uses BERT for NLP. | ||
|
||
## Requirements | ||
|
||
Install tinygrad and mlperf-logging from master. | ||
``` | ||
git clone https://github.com/tinygrad/tinygrad.git | ||
python3 -m pip install -e ".[mlperf]" | ||
``` | ||
Also install tqdm and tensorflow. | ||
``` | ||
pip install tqdm tensorflow | ||
``` | ||
|
||
### tinybox_green | ||
Install the p2p driver per [README](https://github.com/tinygrad/open-gpu-kernel-modules/blob/550.54.15-p2p/README.md) | ||
This is the default on production tinybox green. | ||
|
||
### tinybox_red | ||
Disable cwsr + increase mes timeout. | ||
Install the custom amdgpu driver per [README](https://github.com/nimlgen/amdgpu_ubuntu_22_04/blob/v6.1.3/readme.md) | ||
|
||
# 2. Directions | ||
|
||
## Steps to download and verify data | ||
|
||
### 1. Download raw data | ||
|
||
``` | ||
BASEDIR="/raid/datasets/wiki" WIKI_TRAIN=1 VERIFY_CHECKSUM=1 python3 extra/datasets/wikipedia_download.py | ||
``` | ||
|
||
### 2. Preprocess train and validation data | ||
|
||
Note: The number of threads used for preprocessing is limited by available memory. With 128GB of RAM, a maximum of 16 threads is recommended. | ||
|
||
#### Training: | ||
``` | ||
BASEDIR="/raid/datasets/wiki" NUM_WORKERS=16 python3 extra/datasets/wikipedia.py pre-train all | ||
``` | ||
|
||
Generating a specific topic (Between 0 and 499) | ||
``` | ||
BASEDIR="/raid/datasets/wiki" python3 extra/datasets/wikipedia.py pre-train 42 | ||
``` | ||
|
||
#### Validation: | ||
``` | ||
BASEDIR="/raid/datasets/wiki" python3 extra/datasets/wikipedia.py pre-eval | ||
``` | ||
## Running | ||
|
||
### tinybox_green | ||
|
||
#### Steps to run benchmark | ||
``` | ||
examples/mlperf/training_submission_v4.1/tinycorp/benchmarks/bert/implementations/tinybox_green/run_and_time.sh | ||
``` | ||
|
||
### tinybox_red | ||
|
||
#### One time setup | ||
|
||
``` | ||
examples/mlperf/training_submission_v4.1/tinycorp/benchmarks/bert/implementations/tinybox_red/setup.sh | ||
``` | ||
|
||
#### Steps to run benchmark | ||
``` | ||
examples/mlperf/training_submission_v4.1/tinycorp/benchmarks/bert/implementations/tinybox_red/run_and_time.sh | ||
``` |
13 changes: 13 additions & 0 deletions
13
...aining_submission_v5.0/tinycorp/benchmarks/bert/implementations/tinybox_green/dev_beam.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
#!/bin/bash | ||
|
||
export PYTHONPATH="." | ||
export MODEL="bert" | ||
export DEFAULT_FLOAT="HALF" GPUS=6 BS=66 EVAL_BS=6 | ||
|
||
export BEAM=4 BEAM_UOPS_MAX=2000 BEAM_UPCAST_MAX=64 BEAM_LOCAL_MAX=512 | ||
export IGNORE_JIT_FIRST_BEAM=1 | ||
export BASEDIR="/raid/datasets/wiki" | ||
|
||
export BENCHMARK=10 DEBUG=2 | ||
|
||
python3 examples/mlperf/model_train.py |
13 changes: 13 additions & 0 deletions
13
...raining_submission_v5.0/tinycorp/benchmarks/bert/implementations/tinybox_green/dev_run.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
#!/bin/bash | ||
|
||
export PYTHONPATH="." | ||
export MODEL="bert" | ||
export DEFAULT_FLOAT="HALF" GPUS=6 BS=66 EVAL_BS=6 | ||
|
||
export BEAM=4 BEAM_UOPS_MAX=2000 BEAM_UPCAST_MAX=64 BEAM_LOCAL_MAX=512 | ||
export IGNORE_JIT_FIRST_BEAM=1 | ||
export BASEDIR="/raid/datasets/wiki" | ||
|
||
export WANDB=1 PARALLEL=0 | ||
|
||
RUNMLPERF=1 python3 examples/mlperf/model_train.py |
23 changes: 23 additions & 0 deletions
23
...ng_submission_v5.0/tinycorp/benchmarks/bert/implementations/tinybox_green/run_and_time.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
#!/bin/bash | ||
|
||
export PYTHONPATH="." | ||
export MODEL="bert" | ||
export SUBMISSION_PLATFORM="tinybox_green" | ||
export DEFAULT_FLOAT="HALF" GPUS=6 BS=66 EVAL_BS=6 | ||
|
||
export BEAM=4 BEAM_UOPS_MAX=2000 BEAM_UPCAST_MAX=64 BEAM_LOCAL_MAX=512 | ||
export IGNORE_JIT_FIRST_BEAM=1 | ||
export BASEDIR="/raid/datasets/wiki" | ||
|
||
# pip install -e ".[mlperf]" | ||
export LOGMLPERF=1 | ||
|
||
export SEED=$RANDOM | ||
DATETIME=$(date "+%m%d%H%M") | ||
LOGFILE="bert_green_${DATETIME}_${SEED}.log" | ||
|
||
# init | ||
BENCHMARK=10 INITMLPERF=1 python3 examples/mlperf/model_train.py | tee $LOGFILE | ||
|
||
# run | ||
PARALLEL=0 RUNMLPERF=1 python3 examples/mlperf/model_train.py | tee -a $LOGFILE |
73 changes: 73 additions & 0 deletions
73
..._submission_v5.0/tinycorp/benchmarks/bert/implementations/tinybox_red/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
# 1. Problem | ||
|
||
This problem uses BERT for NLP. | ||
|
||
## Requirements | ||
|
||
Install tinygrad and mlperf-logging from master. | ||
``` | ||
git clone https://github.com/tinygrad/tinygrad.git | ||
python3 -m pip install -e ".[mlperf]" | ||
``` | ||
Also install tqdm and tensorflow. | ||
``` | ||
pip install tqdm tensorflow | ||
``` | ||
|
||
### tinybox_green | ||
Install the p2p driver per [README](https://github.com/tinygrad/open-gpu-kernel-modules/blob/550.54.15-p2p/README.md) | ||
This is the default on production tinybox green. | ||
|
||
### tinybox_red | ||
Disable cwsr + increase mes timeout. | ||
Install the custom amdgpu driver per [README](https://github.com/nimlgen/amdgpu_ubuntu_22_04/blob/v6.1.3/readme.md) | ||
|
||
# 2. Directions | ||
|
||
## Steps to download and verify data | ||
|
||
### 1. Download raw data | ||
|
||
``` | ||
BASEDIR="/raid/datasets/wiki" WIKI_TRAIN=1 VERIFY_CHECKSUM=1 python3 extra/datasets/wikipedia_download.py | ||
``` | ||
|
||
### 2. Preprocess train and validation data | ||
|
||
Note: The number of threads used for preprocessing is limited by available memory. With 128GB of RAM, a maximum of 16 threads is recommended. | ||
|
||
#### Training: | ||
``` | ||
BASEDIR="/raid/datasets/wiki" NUM_WORKERS=16 python3 extra/datasets/wikipedia.py pre-train all | ||
``` | ||
|
||
Generating a specific topic (Between 0 and 499) | ||
``` | ||
BASEDIR="/raid/datasets/wiki" python3 extra/datasets/wikipedia.py pre-train 42 | ||
``` | ||
|
||
#### Validation: | ||
``` | ||
BASEDIR="/raid/datasets/wiki" python3 extra/datasets/wikipedia.py pre-eval | ||
``` | ||
## Running | ||
|
||
### tinybox_green | ||
|
||
#### Steps to run benchmark | ||
``` | ||
examples/mlperf/training_submission_v4.1/tinycorp/benchmarks/bert/implementations/tinybox_green/run_and_time.sh | ||
``` | ||
|
||
### tinybox_red | ||
|
||
#### One time setup | ||
|
||
``` | ||
examples/mlperf/training_submission_v4.1/tinycorp/benchmarks/bert/implementations/tinybox_red/setup.sh | ||
``` | ||
|
||
#### Steps to run benchmark | ||
``` | ||
examples/mlperf/training_submission_v4.1/tinycorp/benchmarks/bert/implementations/tinybox_red/run_and_time.sh | ||
``` |
13 changes: 13 additions & 0 deletions
13
...training_submission_v5.0/tinycorp/benchmarks/bert/implementations/tinybox_red/dev_beam.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
#!/bin/bash | ||
|
||
export PYTHONPATH="." | ||
export MODEL="bert" | ||
export DEFAULT_FLOAT="HALF" GPUS=6 BS=66 EVAL_BS=6 | ||
|
||
export BEAM=3 | ||
export IGNORE_JIT_FIRST_BEAM=1 | ||
export BASEDIR="/raid/datasets/wiki" | ||
|
||
export BENCHMARK=10 DEBUG=2 | ||
|
||
python3 examples/mlperf/model_train.py |
13 changes: 13 additions & 0 deletions
13
.../training_submission_v5.0/tinycorp/benchmarks/bert/implementations/tinybox_red/dev_run.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
#!/bin/bash | ||
|
||
export PYTHONPATH="." | ||
export MODEL="bert" | ||
export DEFAULT_FLOAT="HALF" GPUS=6 BS=66 EVAL_BS=6 | ||
|
||
export BEAM=3 | ||
export IGNORE_JIT_FIRST_BEAM=1 | ||
export BASEDIR="/raid/datasets/wiki" | ||
|
||
export WANDB=1 PARALLEL=0 | ||
|
||
RUNMLPERF=1 python3 examples/mlperf/model_train.py |
23 changes: 23 additions & 0 deletions
23
...ning_submission_v5.0/tinycorp/benchmarks/bert/implementations/tinybox_red/run_and_time.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
#!/bin/bash | ||
|
||
export PYTHONPATH="." | ||
export MODEL="bert" | ||
export SUBMISSION_PLATFORM="tinybox_red" | ||
export DEFAULT_FLOAT="HALF" GPUS=6 BS=66 EVAL_BS=6 | ||
|
||
export BEAM=3 | ||
export IGNORE_JIT_FIRST_BEAM=1 | ||
export BASEDIR="/raid/datasets/wiki" | ||
|
||
# pip install -e ".[mlperf]" | ||
export LOGMLPERF=1 | ||
|
||
export SEED=$RANDOM | ||
DATETIME=$(date "+%m%d%H%M") | ||
LOGFILE="bert_red_${DATETIME}_${SEED}.log" | ||
|
||
# init | ||
BENCHMARK=10 INITMLPERF=1 python3 examples/mlperf/model_train.py | tee $LOGFILE | ||
|
||
# run | ||
PARALLEL=0 RUNMLPERF=1 python3 examples/mlperf/model_train.py | tee -a $LOGFILE |
8 changes: 8 additions & 0 deletions
8
...rf/training_submission_v5.0/tinycorp/benchmarks/bert/implementations/tinybox_red/setup.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
#!/bin/bash | ||
|
||
rocm-smi --setprofile compute | ||
rocm-smi --setmclk 3 | ||
rocm-smi --setperflevel high | ||
|
||
# power cap to 350W | ||
# echo "350000000" | sudo tee /sys/class/drm/card{1..6}/device/hwmon/hwmon*/power1_cap |
50 changes: 50 additions & 0 deletions
50
...mission_v5.0/tinycorp/benchmarks/resnet/implementations/tinybox_green/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
# 1. Problem | ||
|
||
This problem uses the ResNet-50 CNN to do image classification. | ||
|
||
## Requirements | ||
|
||
Install tinygrad and mlperf-logging from master. | ||
``` | ||
git clone https://github.com/tinygrad/tinygrad.git | ||
python3 -m pip install -e ".[mlperf]" | ||
``` | ||
|
||
### tinybox_green | ||
Install the p2p driver per [README](https://github.com/tinygrad/open-gpu-kernel-modules/blob/550.54.15-p2p/README.md) | ||
This is the default on production tinybox green. | ||
|
||
### tinybox_red | ||
Disable cwsr | ||
This is the default on production tinybox red. | ||
``` | ||
sudo vi /etc/modprobe.d/amdgpu.conf | ||
cat <<EOF > /etc/modprobe.d/amdgpu.conf | ||
options amdgpu cwsr_enable=0 | ||
EOF | ||
sudo update-initramfs -u | ||
sudo reboot | ||
# validate | ||
sudo cat /sys/module/amdgpu/parameters/cwsr_enable #= 0 | ||
``` | ||
|
||
# 2. Directions | ||
|
||
## Steps to download and verify data | ||
|
||
``` | ||
IMGNET_TRAIN=1 python3 extra/datasets/imagenet_download.py | ||
``` | ||
|
||
## Steps for one time setup | ||
|
||
### tinybox_red | ||
``` | ||
examples/mlperf/training_submission_v4.0/tinycorp/benchmarks/resnet/implementations/tinybox_red/setup.sh | ||
``` | ||
|
||
## Steps to run benchmark | ||
``` | ||
examples/mlperf/training_submission_v4.0/tinycorp/benchmarks/resnet/implementations/tinybox_red/run_and_time.sh | ||
``` |
13 changes: 13 additions & 0 deletions
13
...ning_submission_v5.0/tinycorp/benchmarks/resnet/implementations/tinybox_green/dev_beam.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
#!/bin/bash | ||
|
||
export PYTHONPATH="." | ||
export MODEL="resnet" | ||
export DEFAULT_FLOAT="HALF" GPUS=6 BS=1536 EVAL_BS=192 | ||
|
||
export LAZYCACHE=0 RESET_STEP=0 | ||
|
||
export TRAIN_BEAM=4 IGNORE_JIT_FIRST_BEAM=1 BEAM_UOPS_MAX=1500 BEAM_UPCAST_MAX=64 BEAM_LOCAL_MAX=1024 BEAM_MIN_PROGRESS=10 BEAM_PADTO=0 | ||
|
||
export BENCHMARK=10 DEBUG=2 | ||
|
||
python3 examples/mlperf/model_train.py |
15 changes: 15 additions & 0 deletions
15
...ining_submission_v5.0/tinycorp/benchmarks/resnet/implementations/tinybox_green/dev_run.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
#!/bin/bash | ||
|
||
export PYTHONPATH="." | ||
export MODEL="resnet" | ||
export DEFAULT_FLOAT="HALF" GPUS=6 BS=1536 EVAL_BS=192 | ||
|
||
export LAZYCACHE=0 RESET_STEP=0 | ||
|
||
export TRAIN_BEAM=4 IGNORE_JIT_FIRST_BEAM=1 BEAM_UOPS_MAX=1500 BEAM_UPCAST_MAX=64 BEAM_LOCAL_MAX=1024 BEAM_MIN_PROGRESS=10 BEAM_PADTO=0 | ||
|
||
export EVAL_START_EPOCH=3 EVAL_FREQ=4 | ||
|
||
export WANDB=1 PARALLEL=0 | ||
|
||
python3 examples/mlperf/model_train.py |
Oops, something went wrong.