Skip to content

Commit

Permalink
PT docker path fix and setuptools advisory
Browse files Browse the repository at this point in the history
  • Loading branch information
omrialmog committed Jan 27, 2022
1 parent 1552d08 commit 091e866
Show file tree
Hide file tree
Showing 4 changed files with 38 additions and 9 deletions.
38 changes: 29 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
- [Check TF/Horovod Habana packages](#check-tfhorovod-habana-packages)
- [Install TF/Horovod Habana packages](#install-tfhorovod-habana-packages)
- [Check PT Habana packages](#check-pt-habana-packages)
- [Install PT Habana packages](#install-pt-habana-packages)
- [Install PT Habana packages](#install-pytorch-habana-packages)
- Docker
- [Do you want to use prebuilt docker or build docker yourself?](#do-you-want-to-use-prebuilt-docker-or-build-docker-yourself)
- [How to Build Docker Images from Habana Dockerfiles](#how-to-build-docker-images-from-habana-dockerfiles)
Expand Down Expand Up @@ -1157,7 +1157,18 @@ export PYTHON=/usr/bin/python<VER> # i.e. for U18 it's PYTHON=/usr/bin/python3.7
```
2. Before installing habana-tensorflow, install supported TensorFlow version. See [Support Matrix](#SynapseAi-Support-Matrix). If no TensorFlow package is available, PIP will automatically fetch it.
#### <a id="tf-setuptools-issue"></a>Note:
Setuptools release 60.x (and later) revealed an issue in Habana-TensorFlow package.
TensorFlow package is automatically fetching the latest available version of Setuptools from PyPi because dependant TensorBoard requires version ``>=41.0.0``.
In order to avoid problems, there are two options:
* explicitly install Setuptools with version **higher equal** 41.0.0 and **less** than 60.0.0
or
* when running any script using the Habana-Tensorflow package, set environment variable: ``SETUPTOOLS_USE_DISTUTILS=stdlib``.
<br>
```
# install TensorFlow
${PYTHON} -m pip install --user tensorflow-cpu==<supported_tf_version>
```
Expand Down Expand Up @@ -1249,6 +1260,14 @@ Install the habana-horovod package to get multi-node support. The following list
${PYTHON} -m pip install --user habana-horovod==1.2.0.585 --extra-index-url https://vault.habana.ai/artifactory/api/pypi/gaudi-python/simple
```
#### Note:
habana-horovod installation is also affected by [issue](#tf-setuptools-issue) found in habana-tensorflow. In order to install habana-horovod successfully, there are two options:
* explicitly install Setuptools with version **higher equal** 41.0.0 and **less** than 60.0.0
or
* set environment variable: ``SETUPTOOLS_USE_DISTUTILS=stdlib`` before habana-horovod installation.
<br>
#### See also:
To learn more about the TensorFlow distributed training on Gaudi, see [Distributed Training with TensorFlow](https://docs.habana.ai/en/v1.2.0/Tensorflow_Scaling_Guide/TensorFlow_Gaudi_Scaling_Guide.html#distributed-training-with-tensorflow).
<br>
Expand Down Expand Up @@ -1401,7 +1420,7 @@ Check for for the packages listed above
<center>
### Are the required python packages installed on your system?
[Yes](#Setup-Complete) • [No](#install-pt-habana-packages)
[Yes](#Setup-Complete) • [No](#install-pytorch-habana-packages)
</center>
Expand All @@ -1416,6 +1435,7 @@ Check for for the packages listed above
---
<br />
Download and execute bash script to setup Habana PyTorch environment [pytorch_installation.sh](https://github.com/HabanaAI/Setup_and_Install/blob/r1.2.0/installation_scripts/pytorch/pytorch_installation.sh)
It will complete below steps
- Autodetect OS type and supported python version for which Habana PyTorch wheel packages are present in Vault
Expand Down Expand Up @@ -1723,7 +1743,7 @@ docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_va
```
PT:
```
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.2.0/${OS}/habanalabs/pytorch-installer:1.2.0-585
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.2.0/${OS}/habanalabs/pytorch-installer-1.10.0:1.2.0-585
```
**OPTIONAL:** Add the following flag to mount a local host share folder to the docker in order to be able to transfer files out of docker:
Expand Down Expand Up @@ -2094,11 +2114,11 @@ You might need to merge the new argument with your existing configuration.
### Pull docker
```
docker pull vault.habana.ai/gaudi-docker/1.2.0/ubuntu20.04/habanalabs/pytorch-installer:1.2.0-585
docker pull vault.habana.ai/gaudi-docker/1.2.0/ubuntu20.04/habanalabs/pytorch-installer-1.10.0:1.2.0-585
```
### Run docker
```
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.2.0/ubuntu20.04/habanalabs/pytorch-installer:1.2.0-585
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.2.0/ubuntu20.04/habanalabs/pytorch-installer-1.10.0:1.2.0-585
```
</details>
Expand Down Expand Up @@ -2146,11 +2166,11 @@ You might need to merge the new argument with your existing configuration.
### Pull docker
```
docker pull vault.habana.ai/gaudi-docker/1.2.0/ubuntu18.04/habanalabs/pytorch-installer:1.2.0-585
docker pull vault.habana.ai/gaudi-docker/1.2.0/ubuntu18.04/habanalabs/pytorch-installer-1.10.0:1.2.0-585
```
### Run docker
```
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.2.0/ubuntu18.04/habanalabs/pytorch-installer:1.2.0-585
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.2.0/ubuntu18.04/habanalabs/pytorch-installer-1.10.0:1.2.0-585
```
</details>
Expand Down Expand Up @@ -2198,11 +2218,11 @@ You might need to merge the new argument with your existing configuration.
### Pull docker
```
docker pull vault.habana.ai/gaudi-docker/1.2.0/amzn2/habanalabs/pytorch-installer:1.2.0-585
docker pull vault.habana.ai/gaudi-docker/1.2.0/amzn2/habanalabs/pytorch-installer-1.10.0:1.2.0-585
```
### Run docker
```
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.2.0/amzn2/habanalabs/pytorch-installer:1.2.0-585
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.2.0/amzn2/habanalabs/pytorch-installer-1.10.0:1.2.0-585
```
</details>
Expand Down
1 change: 1 addition & 0 deletions dockerfiles/Dockerfile_amzn2_tensorflow_installer
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ COPY requirements-no-deps-tensorflow-cpu-"$TF_VERSION".txt requirements-no-deps-
RUN wget https://bootstrap.pypa.io/get-pip.py && \
python3 get-pip.py pip==21.0.1 && \
rm -rf get-pip.py && \
pip3 install setuptools==41.0.0 && \
pip3 install tensorflow-cpu==${TF_VERSION} \
tensorflow-model-optimization==0.5.0 && \
pip3 install --no-deps -r requirements-no-deps-tensorflow-cpu-"$TF_VERSION".txt && \
Expand Down
4 changes: 4 additions & 0 deletions installation_scripts/al2_tensorflow_installation.sh
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,10 @@ else
/sbin/ldconfig
fi

# due to changes in Setuptools 60.x, we need to make sure to install older version of package
# (version chosen to be consistent with dockerfiles)
${PYTHON} -m pip install --user setuptools==41.0.0

export MPICC=${MPI_ROOT}/bin/mpicc
${PYTHON} -m pip install --user mpi4py==3.0.3

Expand Down
4 changes: 4 additions & 0 deletions installation_scripts/u18_tensorflow_installation.sh
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,10 @@ else
sudo /sbin/ldconfig
fi

# due to changes in Setuptools 60.x, we need to make sure to install older version of package
# (version chosen to be consistent with dockerfiles)
${PYTHON} -m pip install --user setuptools==41.0.0

${PYTHON} -m pip install --user mpi4py==3.0.3

# uninstall any existing versions of packages
Expand Down

0 comments on commit 091e866

Please sign in to comment.