Deployment Guide

1. Local cluster deployment

1.1 Deployment Preparations

The following conditions must be met when the Vega is deployed in a local cluster:

Ubuntu 16.04 or later (not tested in other Linux distributions and versions)
CUDA 10.0 download doc
Python 3.7 download
install pip
Before deploying a cluster, you need to install some mandatory [software packages]. You can download the script install_dependencies.sh and install them.

bash install_dependencies.sh

install MPI. For details, see Appendix Install MPI.
install MMDectection(optional, required by the object detection algorithm). For details, see Appendix Install MMDetection.
configure SSH mutual trust.
build the NFS.

After the preceding operations are complete, download the vega deploy package from the Vega library. The deployment package contains the following scripts:

Deployment script: deploy_local_cluster.py
Commissioning script: verify_local_cluster.py
Start script on the slave node: start_slave_worker.py

1.2 Deployment

Configure the deployment information in the deploy.yml file. The file format is as follows:

master: n.n.n.n     # IP address of the master node
listen_port: 8786   # listening port number
slaves: ["n.n.n.n", "n.n.n.n", "n.n.n.n"]    # slave node address

Run the deployment script.

Place deploy_local_cluster.py, verify_local_cluster.py, verga-1.0.0.whl, deploy.yml, and install_dependencies.sh in the same folder on the master node of the cluster. Run the following command to deploy Vega to the master and slave nodes:
```
python deploy_local_cluster.py
```
After the execution is complete, each node is automatically verified. The following information is displayed:
```
success.
```

Reference

Install MMDetection

Download the MMDetection source code.

Download the latest version of the MMDetection from https://github.com/open-mmlab/mmdetection.
Installation

Switch to the mmdetection directory and run the following commands to compile and install the MMDetection:
```
sudo python3 setup.py develop
```

Install Horovod

Install MPI：

Use the apt tool to install MPI directly
```
sudo apt-get install mpi
```
Run the following commandes to check mpi is working.
```
mpirun
```

Install Apex

Installing the Apex

Download the source code from https://github.com/NVIDIA/apex

Decompress the package and switch to the corresponding directory.

pip3 install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Configuring SSH Mutual Trust

Any two hosts on the network must support SSH mutual trust. The configuration method is as follows:

Install SSH. sudo apt-get install sshd
Indicates the public key. ssh-keygen -t rsa two file id_rsa, id_rsa.pub will be create in folder ~/.ssh/, id_rsa.pub is public key.
Check the authorized_keys file in the directory. If the file does not exist, create it and run the chmod 600 ~/.ssh/authorized_keys command to change the permission.
Copy the public key id_rsa.pub to the authorized_keys file on other servers.

Building NFS

On the server:

Install the NFS server.
```
sudo apt install nfs-kernel-server
```

Write the shared path to the configuration file.

sudo echo "/data *(rw,sync,no_subtree_check,no_root_squash)" >> /etc/exports

Create a shared directory.
```
sudo mkdir -p /data
```
Restart the NFS server.
```
sudo service nfs-kernel-server restart
```

On the Client:

Install the client tool.
```
sudo apt install nfs-common
```
Creating a Local Mount Directory
```
sudo mkdir -p /data
```

Mount the shared directory.

sudo mount -t nfs <server ip>:/data /data

Notice: the shared directory(/data) can be arbitrary, but please ensure that they are same between server and client.

CUDA Installation Guide

CUDA Installation in Ubuntu

Download the installation package cuda_10.0.130_410.48_linux.run from the NVIDIA official website.
Run the following installation command:
```
sudo sh cuda_10.0.130_410.48_linux.run
```
During the execution, a series of prompts are displayed. Retain the default settings. Note that you need to select no for Install NVIDIA Accelerated Graphics Driver for Linux-x86_64? when the system asks you whether to install the NVIDIA Accelerated Graphics Driver.
Run the following command to configure environment variables:
```
sudo gedit /etc/profile
```
Add the following content to the end of the profile file:
```
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
```
Save the profile file and run the following command to make the environment variables take effect immediately:
```
source /etc/profile
```
To install the CUDA sample, go to /usr/local/cuda/samples and run the following command to build the samples:
```
sudo make all -j8
```
After the compilation is complete, go to /usr/local/cuda/samples/1_Utilities/deviceQuery, and run the following command:
```
./deviceQuery
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deployment.md

deployment.md

Deployment Guide

1. Local cluster deployment

1.1 Deployment Preparations

1.2 Deployment

Reference

Install MMDetection

Install Horovod

Install Apex

Configuring SSH Mutual Trust

Building NFS

CUDA Installation Guide

Files

deployment.md

Latest commit

History

deployment.md

File metadata and controls

Deployment Guide

1. Local cluster deployment

1.1 Deployment Preparations

1.2 Deployment

Reference

Install MMDetection

Install Horovod

Install Apex

Configuring SSH Mutual Trust

Building NFS

CUDA Installation Guide