Skip to content

Latest commit

 

History

History
691 lines (534 loc) · 23.3 KB

README.md

File metadata and controls

691 lines (534 loc) · 23.3 KB

Open OnDemand Tutorial

External links

Jupyter Installation Tutorial

This tutorial will be using the the hpcadmin credentials listed in Accessing the Applications.

Getting Started

Login

Now you should login to Open OnDemand through https://localhost:3443. Note that you'll have to accept the self-signed certificates from both Open OnDemand and the identity provider.

Get a shell session

At some points during this tutorial you'll need to execute commands in a shell session. You can use the shell app to get an ssh session in the web browser for this purpose.

Create the jupyter application

Click on "My Sandbox Apps (Development)" from the dropdown menu "Develop" in the navigation bar to navigate to the sandbox app workspace.

Now create a new app from the button labeled "New App".

This will bring you to a page where you'll click "Clone Existing App" which will bring you to this form to fill out.

Fill in jupyter as the directory name. /var/git/bc_example_jupyter as the Git Remote and check "Create a new Git Project from this?". Then click "Submit" to create a new development application.

This copied what was in /var/git/bc_example_jupyter to /home/hpcadmin/ondemand/dev/jupyer. You can navigate to these files through the Files app with this link or simply Press the "Files" button in Jupyter's row of the sandbox applications table.

create sandbox app

You'll also need to setup git for the hpcadmin user at this point.

Configure your user email and name so that you can commit changes to the jupyter app.

git config --global user.email hpcadmin@localhost
git config --global user.name "HPC Admin"

Get Jupyter Working

Configure the correct cluster

The example application we've created does not use the correct cluster configuration, so we've got to modify it.

If you try to submit it as is, you'll get this error:

We need to edit the form.yml in the appication's folder. We can navigate to the folder through the files app. The URL is https://localhost:3443/pun/sys/files/fs/home/hpcadmin/ondemand/dev/jupyter/.

Here you'll see the form.yml file. We can edit it by clicking on the file and pressing the "Edit" button. This will take us to the file editor app, with this file open

In the file Editor, specify hpc as the cluster attribute on line 11 like so: cluster: "hpc". Save this file by clicking the "Save" button at the top left.

Launch the Jupyter Application

Now when we navigate back to our interactive sessions, you'll see the "Interactive Apps [Sandbox]" menu with an item labeled "Jupyter Notebook".

Follow this link and we'll be presented with this form for specifying different attributes about the job we want to submit to the SLURM scheduler.

We don't need to change anything in this form, so simply press "Launch" at the bottom of the form. After pressing launch the job should have successfully launched the job and redirected us back the interactive sessions page where we'll see a panel showing our job.

fix cluster and submit

Debug failure

This job is going to run and fail during startup. But don't worry! We're going to debug and fix it.

When the job completes, the panel still exists, so you can follow the link in the panel to the log directory of the job.

Follow the link and we'll be redirected to the job's working directory where an output.log file is. Let's open that file with the "View" button.

When you open the log file, you'll see the something like this where it says jupyter: command not found. So you can see, we have PATH issues.

TIMING - Starting jupyter at: Fri Jul 17 18:06:34 UTC 2020
+ jupyter notebook --config=/home/hpcadmin/ondemand/data/sys/dashboard/batch_connect/dev/jupyter/output/e16b9a77-1a4f-4c9e-95f3-d3c23e5e8d76/config.py
/home/hpcadmin/ondemand/data/sys/dashboard/batch_connect/dev/jupyter/output/e16b9a77-1a4f-4c9e-95f3-d3c23e5e8d76/script.sh: line 27: jupyter: command not found
Timed out waiting for Jupyter Notebook server to open port 16970!

Configure jupyter PATH

So we know what the issue is, the job's script can't find the jupyter executable in the PATH.

Jupyter was installed in these containers through Python's virtual environment and that's why it's not directly in the shell scripts PATH.

We need to add this line to our job's shell script to enable it.

source /usr/local/jupyter/2.1.4/bin/activate

So let's open the template/script.sh.erb in the file editor and add this to line 27 of the shell script, just before we start jupyter.

Lines 24 - 31 of template/script.sh.erb should now look like this.

# Benchmark info
echo "TIMING - Starting jupyter at: $(date)"

source /usr/local/jupyter/2.1.4/bin/activate

# Launch the Jupyter Notebook Server
set -x
jupyter notebook --config="${CONFIG_FILE}" <%= context.extra_jupyter_args %>

Correctly launch

Now we can launch the application again and it should work.

When it is up and running and available to use the panel will show a "Connect to Jupyter" button. Click this button and OnDemand will redirect us to Jupyter.

fix path and launch

Congratulations! We've now started development on the Jupyter Notebook batch connect application and successfully connected to it.

You may want to delete this job now by using the "Delete" button on the panels as we'll be iterating through developing the application and starting new jobs.

Save your spot

Now it's probably a good idea to save the modifications. They're small, but it'll still help if you ever get into trouble and need to revert. A simplified version of the form.yml is in the very next section, and you may want to use and save it instead so that any git diff you do will be much smaller and easier to read.

You can use the shell app to login to this directory

In this shell you'll save in git with these commands:

git add .
git commit -m 'initial commit that correctly submits to the hpc cluster'

git save initial

A closer look at the form.yml

The items in the form.yml directly create what's shown to the users in the form they interact with. Let's take a closer look at the form.yml that created the form you just submitted to get an understanding of how they relate to what's shown in the UI.

This is the form.yml you at this point without all the comments.

cluster: "hpc"
attributes:
  modules: "python"
  extra_jupyter_args: ""
form:
  - modules
  - extra_jupyter_args
  - bc_account
  - bc_queue
  - bc_num_hours
  - bc_num_slots
  - bc_email_on_started

All fields pre-pended with bc_ are special fields OnDemand provides for convenience. They are commonly used fields that create corresponding script attribute. We'll talk more about script attributes later.

  • modules Specifies the modules loaded. Since it's hard coded to "python" (in the attributes) we didn't see it in the form.
  • extra_jupyter_args Specifies the extra jupyter arguments but since it's hard coded to "" we didn't didn't see it in the form.
  • bc_account Creates the "Account" text field and submits the job with the given account.
  • bc_queue Creates the "Partition" text field and submits the job to the given partition.
  • bc_num_hours Creates the "Number of hours" integer field and submits the job with the given walltime.
  • bc_num_slots Creates the "Number of nodes" integer field and submits the job with the requested cores.
  • bc_email_on_started Creates the "I would like to receive an email when the session starts" checkbox and submits the job with a request to email when the job starts.

Updating the Jupyter App

Change bc_queue to a select field

We have 2 partitions enabled in the SLURM containers (SLURM calls queues partitions, so we'll be switching back and forth between the two terms in this tutorial). We've started with a field bc_queue that is a text field, but it's likely much easier for users to simply choose the partition out of a select dropdown menu instead.

So let's remove the bc_queue field in the form and replace it with a new field we'll call custom_queue. Add custom_queue in the form section and remove bc_queue.

We'll also add custom_queue to the attributes section. Adding a field to the form section adds it to the form in the UI. By default, this field will be a text field. If you want this field to be a different type of widget (as we do) you'll configure the field in the attributes section. Also by default the label in the UI is the same just the name of the field. In our case custom_queue would turn into "Custom Queue". This is only slightly correct, so we want to specify the label as "Partition" because that's what it is in SLURM parlance.

Here you can see that we specify the custom_queue in the attributes as a select widget with two options. and a new label. The first element in options arrays is what will be shown to the user (the capitalized version) where the second element is the value what's actually used in the sbatch command.

# form.yml, with only this addition for brevity
attributes:
  custom_queue:
    widget: "select"
    label: "Partition"
    options:
      - ["Compute", "compute"]
      - ["Debug", "debug"]
form:
  - custom_queue
#   - bc_queue

Refresh the new session form and you should now see your updates.

But before we submit to test them out, we'll need to reconfigure the submit.yml.erb to use this new field. You can edit it in the file editor app.

You'll need to specify the script's queue_name as the partition like so. The script is the logical "script" we're submitting to the scheduler. And the queue_name is the field of the script that will specify the queue. (OnDemand knows how to translate it from queue_name into partition for SLURM).

script:
  queue_name: "<%= custom_queue %>"

The .erb file extension indicates this is embedded ruby file. <%= and %> are embedded ruby tags. A variable for each attribute defined in the form.yml can be used in this ERB file, containing the string representation of the value submitted by the user.

If you're not super comfortable with the terminology just remember this: custom_queue is defined in the form.yml (the file that defines what the UI form looks like) so it can be used in the submit.yml.erb (the file that is used to configure the job that is being submitted) as <%= custom_queue =>.

When launch the application again you can login to a shell and confirm you chose a different queue with this command.

[hpcadmin@ondemand ~]$ squeue -o "%j %P"
NAME PARTITION
sys/dashboard/dev/jupyter debug

make custom queue

At this point, this should be the entirety of the script.yml.erb and form.yml (without comments). They're given here in full if you want to copy/paste them. And remember to save your spot!

# script.erb.yml
script:
  queue_name: "<%= custom_queue %>"
# form.yml
cluster: "hpc"
attributes:
  modules: "python"
  extra_jupyter_args: ""
  custom_queue:
    widget: "select"
    label: "Partition"
    options:
      - ["Compute", "compute"]
      - ["Debug", "debug"]
form:
  - modules
  - extra_jupyter_args
  - bc_account
  - custom_queue
  - bc_num_hours
  - bc_num_slots
  - bc_email_on_started

Limit bc_num_cores

SLURM is configured with only 2 cores total. If you were now to submit this app with say 3 or more bc_num_slots it would sit in the queue forever because SLURM cannot find a suitable host to run it on.

So, let's limit this field to a max of 2.

# form.yml
attributes:
  bc_num_slots:
    max: 2

That's it! Again, because bc_num_slots is convenience field, it already has a minimum of 1 that you can't override, because it doesn't make sense to specify 0 or less cores.

Using script native attributes

script.native attributes are way for us to specify any arguments to the schedulers that we can't pre-define or have a good generic defining (like queue_name above).

In this section we're going to put make OnDemand request memory through the sbatch's --mem argument.

First, let's add it to the form like so.

Here are descriptions of all the fields we applied to it. Note if the label was not not defined the default 'Memory' would have been OK. Also we don't really need the the help message here, it was really just for illustration.

  • widget specifies the type of widget to be a number
  • max the maximum value, ~1 GB in this case
  • min the minimum value, 200 MB
  • step the step size when users increase or decrease the value
  • value the default value of 500 MB
  • label the for UIs label
  • help a help message
# form.yml, with only this addition for brevity
attributes:
  memory:
    widget: "number_field"
    max: 1000
    min: 200
    step: 200
    value: 500
    label: "Memory (MB)"
    help: "RSS Memory"
form:
  - memory

Again, now to actually use the value we populate in the form, we need to use it in the submit.yml.erb. This is where script.native attributes come in.

# script.erb.yml
script:
  native:
    - "--mem"
    - "<%= memory %>M"

Native attributes are an array and they're passed to the schedule just as they're defined here.

This would translate into a command much like: sbatch --mem 800M. As you can see native allows us to pass anything we wish into the scheduler command.

To confirm your job is running with the correct memory parameters, simply login to a shell and run the command below. You should see output like this.

[hpcadmin@ondemand /]$ squeue -o "%j %m"
NAME MIN_MEMORY
sys/dashboard/dev/jupyter 800M

At this point, this should be the entirety of the script.yml.erb and form.yml (without comments). They're given here in full if you want to copy/paste them. And remember to save your spot!

# script.erb.yml
---
script:
  queue_name: "<%= custom_queue %>"
  native:
    - "--mem"
    - "<%= memory %>M"
# form.yml
cluster: "hpc"
attributes:
  modules: "python"
  extra_jupyter_args: ""
  custom_queue:
    widget: "select"
    label: "Partition"
    options:
      - ["Compute", "compute"]
      - ["Debug", "debug"]
  bc_num_slots:
    max: 2
  memory:
    widget: "number_field"
    max: 1000
    min: 200
    step: 200
    value: 500
    label: "Memory (MB)"
    help: "RSS Memory"
form:
  - modules
  - extra_jupyter_args
  - bc_account
  - custom_queue
  - bc_num_hours
  - bc_num_slots
  - bc_email_on_started
  - memory

Adding a jupyterlab checkbox

Jupyter ships with both Notebooks and JupyterLab. Some users may want to use JuypterLab instead of Notebooks, so let's give them that option.

First, let's add the checkbox to the form.

# form.yml, with only this addition for brevity
attributes:
  jupyterlab_switch:
    widget: "check_box"
    label: "Use JupyterLab instead of Jupyter Notebook?"
    help: |
      JupyterLab is the next generation of Jupyter, and is completely compatible with existing Jupyter Notebooks.
form:
  - jupyterlab_switch

Refresh the new session form and you should now see your updates.

For this change, there's no need to edit the submit.yml.erb. This toggle happens in the actual script that's ran during the job, so we have to edit template.sh.erb. Note that this is also an ERB script, so it gets templated in Ruby before being submitted to the scheduler.

Line 31 is as follows:

jupyter notebook --config="${CONFIG_FILE}" <%= context.extra_jupyter_args %>

Replace the notebook parameter with this new toggle.

jupyter <%= context.jupyterlab_switch == "1" ? "lab" : "notebook" %> --config="${CONFIG_FILE}" <%= context.extra_jupyter_args %>

If you're unfamiliar with Ruby ternary statements, you can read it them like this: if true ? do this : else do that. So this reads, if context.jupyterlab_switch is 1 use lab, else use notebook.

Also note the use of context here where we didn't have to use that in the submit.yml.erb. This is an important difference. To reference variables from the form in the template/*.sh.erb files you must reference them through the context object.

Now you can submit the job with the checked box to use JupyterLab instead of Notebook and you can see the Jupyter UI is significantly different.

At this point, this should be the entirety of the form.yml (without comments). They're given here in full if you want to copy/paste them. And remember to save your spot!

# form.yml
cluster: "hpc"
attributes:
  modules: "python"
  extra_jupyter_args: ""
  custom_queue:
    widget: "select"
    label: "Partition"
    options:
      - ["Compute", "compute"]
      - ["Debug", "debug"]
  bc_num_slots:
    max: 2
  memory:
    widget: "number_field"
    max: 1000
    min: 200
    step: 200
    value: 500
    label: "Memory (MB)"
    help: "RSS Memory"
  jupyterlab_switch:
    widget: "check_box"
    label: "Use JupyterLab instead of Jupyter Notebook?"
    help: |
      JupyterLab is the next generation of Jupyter, and is completely compatible with existing Jupyter Notebooks.
form:
form:
  - modules
  - extra_jupyter_args
  - bc_account
  - custom_queue
  - bc_num_hours
  - bc_num_slots
  - bc_email_on_started
  - memory
  - jupyterlab_switch

Promoting to production

Cleaning up the form

Now we're ready to deploy to production, let's clean up the form a little bit.

We want to remove some items because they're in the example for a real site, but for containers, they just don't apply.

Let's remove these items from the form. Note you'll also have to remove modules and extra_jupyter_args from the attributes section too.

  • modules becuase modules don't exist on these compute nodes
  • extra_jupyter_args because we're not passing any
  • bc_account because only 1 account is applied to each user, so there's no need to change it.
  • bc_email_on_started because containers can't email these fake users

Since we got rid of extra_jupyter_args and modules, we'll also have them remove it from the template/script.sh.erb as well.

Remove lines 13-22 to get rid of modules. And extra_jupyter_args is on line 29 of template/script.sh.erb.

# remove this block from the 'unless' on line 13 to the 'end' at line 22.
<%- unless context.modules.blank? -%>
# Purge the module environment to avoid conflicts
module purge

# Load the require modules
module load <%= context.modules %>

# List loaded modules
module list
<%- end -%>

# ...

# and remove the last parameter given to jupyter on line 31
jupyter <%= context.jupyterlab_switch == "1" ? "lab" : "notebook" %> --config="${CONFIG_FILE}" <%= context.extra_jupyter_args %>

Now it should look like this:

jupyter <%= context.jupyterlab_switch == "1" ? "lab" : "notebook" %> --config="${CONFIG_FILE}"

At this point, this should be the entirety of the template/script.sh.erb and form.yml (without comments). They're given here in full if you want to copy/paste them. And remember to save your spot!

#!/usr/bin/env bash

# Benchmark info
echo "TIMING - Starting main script at: $(date)"

# Set working directory to home directory
cd "${HOME}"

#
# Start Jupyter Notebook Server
#

# Benchmark info
echo "TIMING - Starting jupyter at: $(date)"

source /usr/local/jupyter/2.1.4/bin/activate

# Launch the Jupyter Notebook Server
set -x
jupyter <%= context.jupyterlab_switch == "1" ? "lab" : "notebook" %> --config="${CONFIG_FILE}"
# form.yml
cluster: "hpc"
attributes:
  custom_queue:
    widget: "select"
    label: "Partition"
    options:
      - ["Compute", "compute"]
      - ["Debug", "debug"]
  bc_num_slots:
    max: 2
  memory:
    widget: "number_field"
    max: 1000
    min: 200
    step: 200
    value: 500
    label: "Memory (MB)"
    help: "RSS Memory"
  jupyterlab_switch:
    widget: "check_box"
    label: "Use JupyterLab instead of Jupyter Notebook?"
    help: |
      JupyterLab is the next generation of Jupyter, and is completely compatible with existing Jupyter Notebooks.
form:
  - custom_queue
  - bc_num_hours
  - bc_num_slots
  - memory
  - jupyterlab_switch

Edit the manifest

The OnDemand UI pulls things from the manifest.yml like the title of the application and where to put it in the column of interactive applications.

Let's change the these fields. You can change any field except for role. And you can change them to something different than what's given here (have fun with it!). All fields besides role are purely descriptive or relate to UI groups so we can freely change them without any behavior change. Conversely, role needs to be batch_connect so don't change this!

---
# change the name, this is what shows up in the menu
name: HPC Tutorial Jupyter
# change the category just to differentiate from the system installed
# deskop application
category: Tutorial Apps
# change the subcategory
subcategory: Machine Learning
role: batch_connect
# change the description, this shows up when you hover over the menu item
description: |
  This app will launch a Jupyter Lab or Notebootk on one or more nodes.

If you want to change category and subcategory you can freely do so. These attributes create groupings for applications. Since we will only have two applications (the system installed "Interactive Apps/Desktops" and this app)

Now save your spot because the next thing we're going to do is deploy this development application to production.

Deploying to production

Deploying to production is as easy as copying the files from your dev directory to the system's app directory.

If you don't already have a shell session get a shell session now and execute these commands.

ssh ondemand
cd ~/ondemand/dev
sudo cp -R jupyter/ /var/www/ood/apps/sys/

And that's it! All you have to do now is refresh the page and you should see your Jupyter system app in the menu along with your sandbox development app.

deploy to production

Tutorial Navigation

Next - Acknowledgments Previous Step - Open XDMoD Back to Start