Automated KRAI X workflows for reproducing MLPerf Inference submissions on systems equipped with Qualcomm Cloud AI 100 accelerators.
Instructions below have been tested on Ubuntu systems (20.04, 22.04).
Depending on the benchmarks to be used, allocate between 50G and 500G of space
on a fast local storage e.g. NVMe SSD. Define an environment variable
WORKSPACE_DIR
to point to this space e.g.:
export WORKSPACE_DIR=/local/mnt/workspace
mkdir -p ${WORKSPACE_DIR}/${USER}
mkdir -p ${WORKSPACE_DIR}/sdks
git clone --branch mlperf_4.0 \
https://github.com/krai/axs2qaic-docker ${WORKSPACE_DIR}/axs2qaic-docker
cd ${WORKSPACE_DIR}/axs2qaic-docker
./setup.ubuntu.sh
Download the latest Platform
SDK and Apps
SDK archives (requires
registration and authorization), and extract them under ${WORKSPACE}/sdks
.
1.14.2.0
$ cd ${WORKSPACE_DIR}/sdks && md5sum *1.14.2.0*.zip c58be58fd71d5a224075cd477a0e2794 qaic-apps-1.14.2.0.zip b5c583d702e75fbe94b82087bfe0e778 qaic-platform-sdk-x86_64-deb-1.14.2.0.zip
1.12.2.0
$ cd ${WORKSPACE_DIR}/sdks && md5sum *1.12.2.0*.zip 43f51903ea8954564c00270d88b0f044 qaic-apps-1.12.2.0.zip dc725ef6f99302aa733adf640b5b1da2 qaic-platform-sdk-x86_64-deb-1.12.2.0.zip
Define an environment variable SDK_VER
e.g.:
export SDK_VER=1.14.2.0
cd ${WORKSPACE_DIR}/sdks
unzip qaic-apps-${SDK_VER}.zip
cd qaic-apps-${SDK_VER}/
./install.sh
cd ${WORKSPACE_DIR}/sdks
unzip qaic-platform-sdk*${SDK_VER}.zip
cd qaic-platfrorm-sdk-${SDK_VER}/x86_64/deb/
./install.sh --auto_upgrade_sbl --ecc enable
cd ${WORKSPACE_DIR}/sdks
unzip qaic-platform-sdk*${SDK_VER}.zip
cd qaic-platfrorm-sdk-${SDK_VER}/x86_64-deb/
./install.sh --auto_upgrade_sbl --ecc disable
echo "
[Unit]
Description=Run QMonitor-server
DefaultDependencies=no
After=network-online.target remote-fs.target
[Service]
Type=simple
ExecStart=/opt/qti-aic/tools/qaic-monitor-grpc-server
Restart=always
[Install]
WantedBy=default.target
" | sudo tee /etc/systemd/system/qaic-monitor-proxy.service
sudo systemctl daemon-reload
sudo systemctl enable qaic-monitor-proxy.service
sudo systemctl start qaic-monitor-proxy.service
systemctl status qaic-monitor-proxy.service
● qaic-monitor-proxy.service - Run QMonitor-server Loaded: loaded (/etc/systemd/system/qaic-monitor-proxy.service; enabled; vendor preset: enabled) Active: active (running) since Wed 2024-04-03 06:26:19 CDT; 3s ago Main PID: 1588362 (qaic-monitor-gr) Tasks: 79 (limit: 629145) Memory: 16.4M CPU: 5.861s CGroup: /system.slice/qaic-monitor-proxy.service └─1588362 /opt/qti-aic/tools/qaic-monitor-grpc-server
sudo systemctl stop qaic-monitor-proxy.service
○ qaic-monitor-proxy.service - Run QMonitor-server Loaded: loaded (/etc/systemd/system/qaic-monitor-proxy.service; enabled; vendor preset: enabled) Active: inactive (dead) since Wed 2024-04-03 06:27:56 CDT; 2s ago Process: 1588362 ExecStart=/opt/qti-aic/tools/qaic-monitor-grpc-server (code=killed, signal=TERM) Main PID: 1588362 (code=killed, signal=TERM) CPU: 6.155s
sudo systemctl restart qaic-monitor-proxy.service
export DEVICE_ID=0 VC_VAL=0x1
/opt/qti-aic/tools/qaic-diag -d $DEVICE_ID -m 0x4B 0x66 0x05 0x1 $VC_VAL
Diag Request: 0x4b 0x66 0x5 0x1 0x1 Diag Response: 0x4b 0x66 0x5 0x1 0x0
Obtain a copy of the ImageNet 2012 validation dataset (50,000 images), and
place it under ${WORKSPACE_DIR}/datasets/imagenet
.
$ cd ${WORKSPACE_DIR}/datasets/imagenet && md5sum ILSVRC2012_val_00000001.JPEG af5e456f0eca2ecabb1d1c4e69964e67 ILSVRC2012_val_00000001.JPEG $ cd ${WORKSPACE_DIR}/datasets/imagenet && du -hs . 6.4G .
Define BENCHMARK
as one of: bert
, resnet50
, retinanet
or sdxl
e.g.:
export BENCHMARK=bert
The build.mlperf.sh
script builds all required Docker images for the given
benchmark: benchmark-independent (base
, qaic
, axs.common
) and
benchmark-dependent (SDK-independent axs.${BENCHMARK}
and SDK-dependent
mlperf.${BENCHMARK}
).
cd ${WORKSPACE_DIR}/axs2qaic-docker
SDK_VER=${SDK_VER} ./build.mlperf.sh ${BENCHMARK}
Default build options:
SDK_VER=1.14.2.0
: SDK version.SDK_DIR=/local/mnt/workspace/sdks
: path to Apps/Platform SDK archives.DOCKER_OS=deb
: to use Ubuntu/Debian images;rpm
support has been deprecated.UBUNTU_VER=20.04
: Ubuntu version for the base image;22.04
is also supported.TIMESTAMP=no
: to record the current date in the image tag or not.
export BENCHMARK=bert
export DOCKER_OS=deb
export SDK_VER=1.14.2.0
export IMAGE_NAME=krai/mlperf.${BENCHMARK}:${DOCKER_OS}_${SDK_VER}
export BENCHMARK=resnet50.full
export DOCKER_OS=deb
export SDK_VER=1.14.2.0
export IMAGE_NAME=krai/mlperf.${BENCHMARK}:${DOCKER_OS}_${SDK_VER}
export BENCHMARK=retinanet
export DOCKER_OS=deb
export SDK_VER=1.14.2.0
export IMAGE_NAME=krai/mlperf.${BENCHMARK}:${DOCKER_OS}_${SDK_VER}
export BENCHMARK=sdxl
export DOCKER_OS=deb
export SDK_VER=1.14.2.0
export IMAGE_NAME=krai/mlperf.${BENCHMARK}:${DOCKER_OS}_${SDK_VER}
We define three ways of launching a Docker container:
-
At the Beginner level, the container uses snapshots of repositories at the time of building the image. Experimental entries cannot be accessed outside the container. ("What happens in Vegas, stays in Vegas.")
-
At the Intermediate level, the container uses snapshots of repositories at the time of building the image. Experimental entries can be accessed outside the container.
-
At the Advanced level, the container maps repositories outside the container onto repositories inside the container. Experimental entries can be accessed outside the container.
Define the environment variables as above, then launch:
docker run -it --name ${USER}_${BENCHMARK} --ipc=host --net=host \
--privileged --group-add $(getent group qaic | cut -d: -f3) \
${IMAGE_NAME}
To preserve experimental entries outside of (transient) Docker containers, we
create a "work collection" that can be accessed from within and from outside
the containers. Steps marked with [1]
should only be performed once.
git clone --branch mlperf_4.0 https://github.com/krai/axs ${WORKSPACE_DIR}/$USER/axs
echo "
# AXS.
export WORKSPACE_DIR=${WORKSPACE_DIR:-/local/mnt/workspace}
export AXS_WORK_COLLECTION=${WORKSPACE_DIR}/${USER}/work_collection
export PATH=${WORKSPACE_DIR}/${USER}/axs:${PATH}
" >> ~/.bashrc
source ~/.bashrc
echo "AXS_WORK_COLLECTION=${AXS_WORK_COLLECTION}"
axs version
axs byquery git_repo,collection,repo_name=axs2mlperf
axs byquery collection,collection_name=experiments --parent_recursion+
export AXS_EXPERIMENTS_DIR=$(axs byquery collection,collection_name=experiments --parent_recursion+ , get_path)
sudo chgrp -R qaic ${AXS_EXPERIMENTS_DIR}
sudo chmod -R g+ws ${AXS_EXPERIMENTS_DIR}
sudo setfacl -R -d -m group:qaic:rwx ${AXS_EXPERIMENTS_DIR}
Define the environment variables as above, then launch:
docker run -it --name ${USER}_${BENCHMARK} --ipc=host --net=host \
--privileged --group-add $(getent group qaic | cut -d: -f3) \
-v ${WORKSPACE_DIR}/${USER}/work_collection/experiments:/home/krai/work_collection/experiments \
${IMAGE_NAME}
Once you enter into a running Docker container, follow links below for benchmarking instructions for individual MLPerf benchmarks.
Unless explicitly stated otherwise, the software in this repository is provided under the permissive MIT license.
Please contact [email protected] if you have any queries.