The following conditions must be met when the Vega is deployed in a local cluster:
- Ubuntu 16.04 or later (not tested in other Linux distributions and versions)
- CUDA 10.0 download doc
- Python 3.7 download
- install pip
- Before deploying a cluster, you need to install some mandatory [software packages]. You can download the script install_dependencies.sh and install them.
bash install_dependencies.sh
- install
MPI
. For details, see Appendix Install MPI. - install
MMDectection
(optional, required by the object detection algorithm). For details, see Appendix Install MMDetection. - configure SSH mutual trust.
- build the NFS.
After the preceding operations are complete, download the vega deploy package from the Vega library. The deployment package contains the following scripts:
- Deployment script: deploy_local_cluster.py
- Commissioning script: verify_local_cluster.py
- Start script on the slave node: start_slave_worker.py
-
Configure the deployment information in the deploy.yml file. The file format is as follows:
master: n.n.n.n # IP address of the master node listen_port: 8786 # listening port number slaves: ["n.n.n.n", "n.n.n.n", "n.n.n.n"] # slave node address
-
Run the deployment script.
Place deploy_local_cluster.py, verify_local_cluster.py, verga-1.0.0.whl, deploy.yml, and install_dependencies.sh in the same folder on the master node of the cluster. Run the following command to deploy Vega to the master and slave nodes:
python deploy_local_cluster.py
After the execution is complete, each node is automatically verified. The following information is displayed:
success.
-
Download the MMDetection source code.
Download the latest version of the MMDetection from https://github.com/open-mmlab/mmdetection.
-
Installation
Switch to the mmdetection directory and run the following commands to compile and install the MMDetection:
sudo python3 setup.py develop
Install MPI:
-
Use the apt tool to install MPI directly
sudo apt-get install mpi
-
Run the following commandes to check mpi is working.
mpirun
Installing the Apex
-
Download the source code from https://github.com/NVIDIA/apex
-
Decompress the package and switch to the corresponding directory.
pip3 install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
Any two hosts on the network must support SSH mutual trust. The configuration method is as follows:
-
Install SSH.
sudo apt-get install sshd
-
Indicates the public key.
ssh-keygen -t rsa
two file id_rsa, id_rsa.pub will be create in folder ~/.ssh/, id_rsa.pub is public key. -
Check the authorized_keys file in the directory. If the file does not exist, create it and run the chmod 600 ~/.ssh/authorized_keys command to change the permission.
-
Copy the public key id_rsa.pub to the authorized_keys file on other servers.
On the server:
-
Install the NFS server.
sudo apt install nfs-kernel-server
-
Write the shared path to the configuration file.
sudo echo "/data *(rw,sync,no_subtree_check,no_root_squash)" >> /etc/exports
-
Create a shared directory.
sudo mkdir -p /data
-
Restart the NFS server.
sudo service nfs-kernel-server restart
On the Client:
-
Install the client tool.
sudo apt install nfs-common
-
Creating a Local Mount Directory
sudo mkdir -p /data
-
Mount the shared directory.
sudo mount -t nfs <server ip>:/data /data
Notice: the shared directory(/data
) can be arbitrary, but please ensure that they are same between server and client.
CUDA Installation in Ubuntu
-
Download the installation package cuda_10.0.130_410.48_linux.run from the NVIDIA official website.
-
Run the following installation command:
sudo sh cuda_10.0.130_410.48_linux.run
During the execution, a series of prompts are displayed. Retain the default settings. Note that you need to select no for Install NVIDIA Accelerated Graphics Driver for Linux-x86_64? when the system asks you whether to install the NVIDIA Accelerated Graphics Driver.
-
Run the following command to configure environment variables:
sudo gedit /etc/profile
Add the following content to the end of the profile file:
export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
Save the profile file and run the following command to make the environment variables take effect immediately:
source /etc/profile
-
To install the CUDA sample, go to /usr/local/cuda/samples and run the following command to build the samples:
sudo make all -j8
After the compilation is complete, go to /usr/local/cuda/samples/1_Utilities/deviceQuery, and run the following command:
./deviceQuery