Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding dss deployment on EC2 #302

Merged
merged 6 commits into from
Dec 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -52,3 +52,8 @@ UI
UUID
VM
YAML
DSS
Jupyter
MLflow
PuTTy
WSL
2 changes: 1 addition & 1 deletion aws/aws-how-to/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ While using Ubuntu on AWS, you'll need to perform tasks such as launching an ins
* :doc:`Upgrade from Focal to Jammy <instances/upgrade-from-focal-to-jammy>`
* :doc:`Configure automated updates <instances/automatically-update-ubuntu-instances>`
* :doc:`Deploy Charmed Kubernetes <instances/deploy-charmed-kubernetes-on-ubuntu-pro>`

* :doc:`Deploy Canonical Data Science Stack <instances/data-science-stack-on-ec2>`


EKS - Using Ubuntu Pro and GPUs on EKS
Expand Down
145 changes: 145 additions & 0 deletions aws/aws-how-to/instances/data-science-stack-on-ec2.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
Deploy Canonical Data Science Stack on EC2 using a GPU-enabled instance type
============================================================================


Canonical Data Science Stack (DSS) is a command line interface-based tool that bundles Jupyter Notebooks, MLflow and frameworks like PyTorch and TensorFlow on top of an orchestration layer, making this product excellent for rapid testing, prototyping and doing ML at a small scale.

While this is a product intended for desktop machine learning users, you can also deploy it on EC2 following these instructions.

We are using a G4DN instance type to leverage the GPU, required for machine learning training and inference.

Learn more about Canonical Data Science Stack in our `press release post`_ and our `official documentation`_.

Launch a GPU-EC2 instance (G4DN instance family)
------------------------------------------------


Navigate to the EC2 Web console, select :guilabel:`Launch instance` and make sure you select either Ubuntu 22.04 or 24.04 LTS (free or Pro), and any G4DN instance type family.

For this example, we are using 22.04 on ``g4dn.xlarge``, which has 4 vCPUs and 16 GB of RAM. It is powered with an NVIDIA T4 GPU.

Make a note of the machine IP and the Key-Pair used. You’ll need it for connecting to the machine.

Log in and install GPU drivers
carlosbravoa marked this conversation as resolved.
Show resolved Hide resolved
------------------------------

Connect to your machine. If you are using Linux (including WSL on Windows) or MacOS, open a terminal window and connect to your machine using:

.. code::

ssh -i <<YOUR_KEYPAIR>> ubuntu@<<YOUR_MACHINE_IP>>


If you are connecting from Windows, you can use PuTTy.

Once you have connected, run a full upgrade:

.. code::

sudo apt update && sudo apt upgrade -y


If you get a new kernel, it is advised to restart the machine before proceeding.

Now install the GPU drivers:

.. code::

sudo apt install -y ubuntu-drivers-common
sudo ubuntu-drivers install
sudo reboot


After reboot, check if the drivers and CUDA have been installed properly and the GPU is detected correctly:

.. code::

nvidia-smi


The output should be similar to:

.. code-block:: none

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 26C P8 9W / 70W | 2MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+

Note that the GPU is properly detected and CUDA has also been installed.

Install MicroK8s and DSS
------------------------

Install MicroK8s:


.. code::

sudo snap install microk8s --channel 1.28/stable --classic
sudo microk8s enable storage dns rbac
sudo microk8s enable gpu

Install DSS:

.. code::

sudo snap install data-science-stack --channel latest/stable
dss initialize --kubeconfig="$(sudo microk8s config)"
Comment on lines +86 to +103
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would prefer to reference the official DSS docs for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR; I'll leave that to @k-dimple
In a conversation with another team (Anbox), we were discussing how referencing other guides sometimes impacted the experience, as a 1/the official guides tend to be more generic and sometimes confuse the customer with other things, and 2/ the How-to stops to be a straightforward recipe.

The documentation person there told me that they are trying now to have self-sufficient guides for that reason.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree with the argument. If the whole how-to guide was just about installing DSS, we could have directly pointed to the DSS docs. Here since we are talking about specific EC2 instances and including specific drivers, it is better to give a complete set of instructions and to point them to the DSS docs for further reference. (Carlos has already done this by adding a link in the introduction section.) So I think we can keep it in its current state.


Create your first Jupyter Notebook:

.. code::

dss create my-tensorflow-notebook --image=kubeflownotebookswg/jupyter-tensorflow-cuda:v1.8.0


DSS will deploy Jupyter Notebooks with TensorFlow and with CUDA enabled. It’ll use a ``clusterIP`` from the MicroK8s cluster, which will only be accessible from inside the machine for the moment.

To allow outside access, change the deployment to use a ``Nodeport`` instead of a ``clusterIP`` and reconnect using an SSH tunnel:


.. code::

sudo microk8s kubectl patch svc my-tensorflow-notebook --type='json' -p '[{"op":"replace","path":"/spec/type","value":"NodePort"}]' --namespace dss


Wait some seconds and it will get updated.


Create an SSH tunnel for accessing the deployment
--------------------------------------------------

Open a new connection to create the tunnel to the deployment port. You can close the previous connection as it won’t be used anymore.


.. code::

ssh -i <<YOUR_KEYPAIR>> ubuntu@<<YOUR_MACHINE_IP>> -L 30633:localhost:30633


Open your browser with the address ``localhost:30633`` and start using your freshly deployed Jupyter Notebook with CUDA enabled.

.. note::
If you want to create more Jupyter Notebook deployments, you'll have to create additional tunnels on new ports.



.. _`press release post`: https://canonical.com/blog/data-science-stack-release
.. _`official documentation`: https://documentation.ubuntu.com/data-science-stack/en/latest/

4 changes: 3 additions & 1 deletion aws/aws-how-to/instances/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,10 @@ Performing upgrades and automating them:
* :doc:`Upgrade to Ubuntu Pro at scale using tokens with SSM <upgrade-to-ubuntu-pro-at-scale-using-tokens-with-ssm>`
* :doc:`Configure automated updates <automatically-update-ubuntu-instances>`

Deploying Charmed Kubernetes:
Deploying Canonical Products:

* :doc:`Deploy Charmed Kubernetes <deploy-charmed-kubernetes-on-ubuntu-pro>`
* :doc:`Deploy Canonical Data Science Stack <data-science-stack-on-ec2>`


.. toctree::
Expand All @@ -49,4 +50,5 @@ Deploying Charmed Kubernetes:
Upgrade to Ubuntu Pro at scale using tokens with SSM <upgrade-to-ubuntu-pro-at-scale-using-tokens-with-ssm>
Configure automated updates <automatically-update-ubuntu-instances>
Deploy Charmed Kubernetes <deploy-charmed-kubernetes-on-ubuntu-pro>
Deploy Data Science Stack <data-science-stack-on-ec2>