Skip to content

Latest commit

 

History

History
333 lines (262 loc) · 9.51 KB

README.md

File metadata and controls

333 lines (262 loc) · 9.51 KB

MLPerf Inference - Qualcomm Cloud AI 100 - Docker

Automated KRAI X workflows for reproducing MLPerf Inference submissions on systems equipped with Qualcomm Cloud AI 100 accelerators.

Common setup

Instructions below have been tested on Ubuntu systems (20.04, 22.04).

Define a workspace directory

Depending on the benchmarks to be used, allocate between 50G and 500G of space on a fast local storage e.g. NVMe SSD. Define an environment variable WORKSPACE_DIR to point to this space e.g.:

export WORKSPACE_DIR=/local/mnt/workspace
mkdir -p ${WORKSPACE_DIR}/${USER}
mkdir -p ${WORKSPACE_DIR}/sdks

Clone the Docker scripts repo

git clone --branch mlperf_4.0 \
https://github.com/krai/axs2qaic-docker ${WORKSPACE_DIR}/axs2qaic-docker

Install system-level dependencies

cd ${WORKSPACE_DIR}/axs2qaic-docker
./setup.ubuntu.sh

Install Qualcomm Cloud AI Platform/Apps SDKs

Download the latest Platform SDK and Apps SDK archives (requires registration and authorization), and extract them under ${WORKSPACE}/sdks.

1.14.2.0
$ cd ${WORKSPACE_DIR}/sdks && md5sum *1.14.2.0*.zip
c58be58fd71d5a224075cd477a0e2794  qaic-apps-1.14.2.0.zip
b5c583d702e75fbe94b82087bfe0e778  qaic-platform-sdk-x86_64-deb-1.14.2.0.zip
1.12.2.0
$ cd ${WORKSPACE_DIR}/sdks && md5sum *1.12.2.0*.zip
43f51903ea8954564c00270d88b0f044  qaic-apps-1.12.2.0.zip
dc725ef6f99302aa733adf640b5b1da2  qaic-platform-sdk-x86_64-deb-1.12.2.0.zip

Define an environment variable SDK_VER e.g.:

export SDK_VER=1.14.2.0

Install Apps SDK

cd ${WORKSPACE_DIR}/sdks
unzip qaic-apps-${SDK_VER}.zip
cd qaic-apps-${SDK_VER}/
./install.sh

Install Platform SDK

Datacenter (ECC on)
cd ${WORKSPACE_DIR}/sdks
unzip qaic-platform-sdk*${SDK_VER}.zip
cd qaic-platfrorm-sdk-${SDK_VER}/x86_64/deb/
./install.sh --auto_upgrade_sbl --ecc enable
Edge (ECC off)
cd ${WORKSPACE_DIR}/sdks
unzip qaic-platform-sdk*${SDK_VER}.zip
cd qaic-platfrorm-sdk-${SDK_VER}/x86_64-deb/
./install.sh --auto_upgrade_sbl --ecc disable

Install monitor service

Configure
echo "
[Unit]
Description=Run QMonitor-server
DefaultDependencies=no
After=network-online.target remote-fs.target

[Service]
Type=simple
ExecStart=/opt/qti-aic/tools/qaic-monitor-grpc-server
Restart=always

[Install]
WantedBy=default.target
" | sudo tee /etc/systemd/system/qaic-monitor-proxy.service
Enable
sudo systemctl daemon-reload
sudo systemctl enable qaic-monitor-proxy.service
sudo systemctl start qaic-monitor-proxy.service
Status
systemctl status qaic-monitor-proxy.service
● qaic-monitor-proxy.service - Run QMonitor-server
     Loaded: loaded (/etc/systemd/system/qaic-monitor-proxy.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2024-04-03 06:26:19 CDT; 3s ago
   Main PID: 1588362 (qaic-monitor-gr)
      Tasks: 79 (limit: 629145)
     Memory: 16.4M
        CPU: 5.861s
     CGroup: /system.slice/qaic-monitor-proxy.service
             └─1588362 /opt/qti-aic/tools/qaic-monitor-grpc-server
Stop (before a Platform SDK update)
sudo systemctl stop qaic-monitor-proxy.service
○ qaic-monitor-proxy.service - Run QMonitor-server
     Loaded: loaded (/etc/systemd/system/qaic-monitor-proxy.service; enabled; vendor preset: enabled)
     Active: inactive (dead) since Wed 2024-04-03 06:27:56 CDT; 2s ago
    Process: 1588362 ExecStart=/opt/qti-aic/tools/qaic-monitor-grpc-server (code=killed, signal=TERM)
   Main PID: 1588362 (code=killed, signal=TERM)
        CPU: 6.155s
Restart (after a Platform SDK update)
sudo systemctl restart qaic-monitor-proxy.service
Test: set minimum device frequency
export DEVICE_ID=0 VC_VAL=0x1
/opt/qti-aic/tools/qaic-diag -d $DEVICE_ID -m 0x4B 0x66 0x05 0x1 $VC_VAL
Diag Request:
0x4b 0x66 0x5 0x1 0x1
Diag Response:
0x4b 0x66 0x5 0x1 0x0

ImageNet (for ResNet50 only)

Obtain a copy of the ImageNet 2012 validation dataset (50,000 images), and place it under ${WORKSPACE_DIR}/datasets/imagenet.

$ cd ${WORKSPACE_DIR}/datasets/imagenet && md5sum ILSVRC2012_val_00000001.JPEG
af5e456f0eca2ecabb1d1c4e69964e67  ILSVRC2012_val_00000001.JPEG
$ cd ${WORKSPACE_DIR}/datasets/imagenet && du -hs .
6.4G    .

Building Docker images

Define BENCHMARK as one of: bert, resnet50, retinanet or sdxl e.g.:

export BENCHMARK=bert

The build.mlperf.sh script builds all required Docker images for the given benchmark: benchmark-independent (base, qaic, axs.common) and benchmark-dependent (SDK-independent axs.${BENCHMARK} and SDK-dependent mlperf.${BENCHMARK}).

cd ${WORKSPACE_DIR}/axs2qaic-docker
SDK_VER=${SDK_VER} ./build.mlperf.sh ${BENCHMARK}

Default build options:

  • SDK_VER=1.14.2.0: SDK version.
  • SDK_DIR=/local/mnt/workspace/sdks: path to Apps/Platform SDK archives.
  • DOCKER_OS=deb: to use Ubuntu/Debian images; rpm support has been deprecated.
  • UBUNTU_VER=20.04: Ubuntu version for the base image; 22.04 is also supported.
  • TIMESTAMP=no: to record the current date in the image tag or not.

Launching Docker containers

Define the image name

BERT

export BENCHMARK=bert
export DOCKER_OS=deb
export SDK_VER=1.14.2.0
export IMAGE_NAME=krai/mlperf.${BENCHMARK}:${DOCKER_OS}_${SDK_VER}

ResNet50

export BENCHMARK=resnet50.full
export DOCKER_OS=deb
export SDK_VER=1.14.2.0
export IMAGE_NAME=krai/mlperf.${BENCHMARK}:${DOCKER_OS}_${SDK_VER}

RetinaNet

export BENCHMARK=retinanet
export DOCKER_OS=deb
export SDK_VER=1.14.2.0
export IMAGE_NAME=krai/mlperf.${BENCHMARK}:${DOCKER_OS}_${SDK_VER}

SDXL

export BENCHMARK=sdxl
export DOCKER_OS=deb
export SDK_VER=1.14.2.0
export IMAGE_NAME=krai/mlperf.${BENCHMARK}:${DOCKER_OS}_${SDK_VER}

Launch a Docker container

We define three ways of launching a Docker container:

  • At the Beginner level, the container uses snapshots of repositories at the time of building the image. Experimental entries cannot be accessed outside the container. ("What happens in Vegas, stays in Vegas.")

  • At the Intermediate level, the container uses snapshots of repositories at the time of building the image. Experimental entries can be accessed outside the container.

  • At the Advanced level, the container maps repositories outside the container onto repositories inside the container. Experimental entries can be accessed outside the container.

Beginner

Launch a container

Define the environment variables as above, then launch:

docker run -it --name ${USER}_${BENCHMARK} --ipc=host --net=host \
--privileged --group-add $(getent group qaic | cut -d: -f3) \
${IMAGE_NAME}

Intermediate

To preserve experimental entries outside of (transient) Docker containers, we create a "work collection" that can be accessed from within and from outside the containers. Steps marked with [1] should only be performed once.

[1] Install axs under ${WORKSPACE_DIR}/$USER
git clone --branch mlperf_4.0 https://github.com/krai/axs ${WORKSPACE_DIR}/$USER/axs
[1] Define environment variables in your ~/.bashrc
echo "

# AXS.
export WORKSPACE_DIR=${WORKSPACE_DIR:-/local/mnt/workspace}
export AXS_WORK_COLLECTION=${WORKSPACE_DIR}/${USER}/work_collection
export PATH=${WORKSPACE_DIR}/${USER}/axs:${PATH}

" >> ~/.bashrc
source ~/.bashrc
echo "AXS_WORK_COLLECTION=${AXS_WORK_COLLECTION}"
axs version
[1] Import the axs2mlperf repo into your work collection
axs byquery git_repo,collection,repo_name=axs2mlperf
[1] Create an empty collection for experiments
axs byquery collection,collection_name=experiments --parent_recursion+
export AXS_EXPERIMENTS_DIR=$(axs byquery collection,collection_name=experiments --parent_recursion+ , get_path)
sudo chgrp -R qaic ${AXS_EXPERIMENTS_DIR}
sudo chmod -R g+ws ${AXS_EXPERIMENTS_DIR}
sudo setfacl -R -d -m group:qaic:rwx ${AXS_EXPERIMENTS_DIR}
Launch a container

Define the environment variables as above, then launch:

docker run -it --name ${USER}_${BENCHMARK} --ipc=host --net=host \
--privileged --group-add $(getent group qaic | cut -d: -f3) \
-v ${WORKSPACE_DIR}/${USER}/work_collection/experiments:/home/krai/work_collection/experiments \
${IMAGE_NAME}

Advanced TODO

Benchmarking

Once you enter into a running Docker container, follow links below for benchmarking instructions for individual MLPerf benchmarks.

License

Unless explicitly stated otherwise, the software in this repository is provided under the permissive MIT license.

Contact

Please contact [email protected] if you have any queries.