This tutorial will be using the the hpcadmin
credentials listed in
Accessing the Applications.
Now you should login to Open OnDemand through https://localhost:3443
. Note that you'll have to
accept the self-signed certificates from both Open OnDemand and the identity provider.
At some points during this tutorial you'll need to execute commands in a shell session. You can use the shell app to get an ssh session in the web browser for this purpose.
Click on "My Sandbox Apps (Development)" from the dropdown menu "Develop" in the navigation bar to navigate to the sandbox app workspace.
Now create a new app from the button labeled "New App".
This will bring you to a page where you'll click "Clone Existing App" which will bring you to this form to fill out.
Fill in jupyter
as the directory name. /var/git/bc_example_jupyter
as the Git Remote and
check "Create a new Git Project from this?". Then click "Submit" to create a new development
application.
This copied what was in /var/git/bc_example_jupyter
to /home/hpcadmin/ondemand/dev/jupyer
.
You can navigate to these files through the Files app with this link
or simply Press the "Files" button in Jupyter's row of the sandbox applications table.
You'll also need to setup git
for the hpcadmin user at this point.
Configure your user email and name so that you can commit changes to the jupyter app.
git config --global user.email hpcadmin@localhost
git config --global user.name "HPC Admin"
The example application we've created does not use the correct cluster configuration, so we've got to modify it.
If you try to submit it as is, you'll get this error:
We need to edit the form.yml
in the appication's folder. We can navigate to the folder through the
files app. The URL is https://localhost:3443/pun/sys/files/fs/home/hpcadmin/ondemand/dev/jupyter/
.
Here you'll see the form.yml
file. We can edit it by clicking on the file and pressing the "Edit"
button. This will take us to the file editor app, with this file open
In the file Editor, specify hpc
as the cluster attribute on line 11 like so: cluster: "hpc"
. Save this file by clicking
the "Save" button at the top left.
Now when we navigate back to our interactive sessions, you'll see the "Interactive Apps [Sandbox]" menu with an item labeled "Jupyter Notebook".
Follow this link and we'll be presented with this form for specifying different attributes about the job we want to submit to the SLURM scheduler.
We don't need to change anything in this form, so simply press "Launch" at the bottom of the form. After pressing launch the job should have successfully launched the job and redirected us back the interactive sessions page where we'll see a panel showing our job.
This job is going to run and fail during startup. But don't worry! We're going to debug and fix it.
When the job completes, the panel still exists, so you can follow the link in the panel to the log directory of the job.
Follow the link and we'll be redirected to the job's working directory where an output.log
file is.
Let's open that file with the "View" button.
When you open the log file, you'll see the something like this where it says jupyter: command not found.
So you can see, we have PATH
issues.
TIMING - Starting jupyter at: Fri Jul 17 18:06:34 UTC 2020
+ jupyter notebook --config=/home/hpcadmin/ondemand/data/sys/dashboard/batch_connect/dev/jupyter/output/e16b9a77-1a4f-4c9e-95f3-d3c23e5e8d76/config.py
/home/hpcadmin/ondemand/data/sys/dashboard/batch_connect/dev/jupyter/output/e16b9a77-1a4f-4c9e-95f3-d3c23e5e8d76/script.sh: line 27: jupyter: command not found
Timed out waiting for Jupyter Notebook server to open port 16970!
So we know what the issue is, the job's script can't find the jupyter
executable in the PATH
.
Jupyter was installed in these containers through Python's virtual environment and that's
why it's not directly in the shell scripts PATH
.
We need to add this line to our job's shell script to enable it.
source /usr/local/jupyter/2.1.4/bin/activate
So let's open the template/script.sh.erb in the file editor and add this to line 27 of the shell script, just before we start jupyter.
Lines 24 - 31 of template/script.sh.erb
should now look like this.
# Benchmark info
echo "TIMING - Starting jupyter at: $(date)"
source /usr/local/jupyter/2.1.4/bin/activate
# Launch the Jupyter Notebook Server
set -x
jupyter notebook --config="${CONFIG_FILE}" <%= context.extra_jupyter_args %>
Now we can launch the application again and it should work.
When it is up and running and available to use the panel will show a "Connect to Jupyter" button. Click this button and OnDemand will redirect us to Jupyter.
Congratulations! We've now started development on the Jupyter Notebook batch connect application and successfully connected to it.
You may want to delete this job now by using the "Delete" button on the panels as we'll be iterating through developing the application and starting new jobs.
Now it's probably a good idea to save the modifications. They're small, but it'll still help if you
ever get into trouble and need to revert. A simplified version of the form.yml
is in the very next
section, and you may want to use and save it instead so that any git diff
you do will be much
smaller and easier to read.
You can use the shell app to login to this directory
In this shell you'll save in git with these commands:
git add .
git commit -m 'initial commit that correctly submits to the hpc cluster'
The items in the form.yml directly create what's shown to the users in the form they interact with.
Let's take a closer look at the form.yml
that created the form you just submitted to get an
understanding of how they relate to what's shown in the UI.
This is the form.yml
you at this point without all the comments.
cluster: "hpc"
attributes:
modules: "python"
extra_jupyter_args: ""
form:
- modules
- extra_jupyter_args
- bc_account
- bc_queue
- bc_num_hours
- bc_num_slots
- bc_email_on_started
All fields pre-pended with bc_
are special fields OnDemand provides for convenience. They are commonly
used fields that create corresponding script attribute. We'll talk more about script attributes later.
modules
Specifies the modules loaded. Since it's hard coded to "python" (in the attributes) we didn't see it in the form.extra_jupyter_args
Specifies the extra jupyter arguments but since it's hard coded to "" we didn't didn't see it in the form.bc_account
Creates the "Account" text field and submits the job with the given account.bc_queue
Creates the "Partition" text field and submits the job to the given partition.bc_num_hours
Creates the "Number of hours" integer field and submits the job with the given walltime.bc_num_slots
Creates the "Number of nodes" integer field and submits the job with the requested cores.bc_email_on_started
Creates the "I would like to receive an email when the session starts" checkbox and submits the job with a request to email when the job starts.
We have 2 partitions enabled in the SLURM containers (SLURM calls queues partitions, so we'll be switching
back and forth between the two terms in this tutorial). We've started with a field bc_queue
that
is a text field, but it's likely much easier for users to simply choose the partition out of a
select dropdown menu instead.
So let's remove the bc_queue
field in the form and replace it with a new field we'll call custom_queue
.
Add custom_queue
in the form section and remove bc_queue.
We'll also add custom_queue
to the attributes section. Adding a field to the form section adds it to the form
in the UI. By default, this field will be a text field. If you want this field to be a different type of widget
(as we do) you'll configure the field in the attributes section. Also by default the label in the UI is the
same just the name of the field. In our case custom_queue
would turn into "Custom Queue". This is only
slightly correct, so we want to specify the label as "Partition" because that's what it is in SLURM parlance.
Here you can see that we specify the custom_queue
in the attributes as a select widget with two options.
and a new label. The first element in options arrays is what will be shown to the user (the capitalized version)
where the second element is the value what's actually used in the sbatch command.
# form.yml, with only this addition for brevity
attributes:
custom_queue:
widget: "select"
label: "Partition"
options:
- ["Compute", "compute"]
- ["Debug", "debug"]
form:
- custom_queue
# - bc_queue
Refresh the new session form and you should now see your updates.
But before we submit to test them out, we'll need to reconfigure the submit.yml.erb
to use this
new field. You can edit it in the
file editor app.
You'll need to specify the script's queue_name as the partition like so. The script
is the logical
"script" we're submitting to the scheduler. And the queue_name
is the field of the script that will
specify the queue. (OnDemand knows how to translate it from queue_name into partition for SLURM).
script:
queue_name: "<%= custom_queue %>"
The .erb file extension indicates this is embedded ruby file. <%=
and %>
are embedded ruby tags. A variable
for each attribute defined in the form.yml can be used in this ERB file, containing the string representation
of the value submitted by the user.
If you're not super comfortable with the terminology just remember this: custom_queue
is defined in the form.yml
(the file that defines what the UI form looks like) so it can be used in the submit.yml.erb
(the file
that is used to configure the job that is being submitted) as <%= custom_queue =>
.
When launch the application again you can login to a shell and confirm you chose a different queue with this command.
[hpcadmin@ondemand ~]$ squeue -o "%j %P"
NAME PARTITION
sys/dashboard/dev/jupyter debug
At this point, this should be the entirety of the script.yml.erb
and form.yml
(without comments).
They're given here in full if you want to copy/paste them. And remember to save your spot!
# script.erb.yml
script:
queue_name: "<%= custom_queue %>"
# form.yml
cluster: "hpc"
attributes:
modules: "python"
extra_jupyter_args: ""
custom_queue:
widget: "select"
label: "Partition"
options:
- ["Compute", "compute"]
- ["Debug", "debug"]
form:
- modules
- extra_jupyter_args
- bc_account
- custom_queue
- bc_num_hours
- bc_num_slots
- bc_email_on_started
SLURM is configured with only 2 cores total. If you were now to submit this app
with say 3 or more bc_num_slots
it would sit in the queue forever because SLURM
cannot find a suitable host to run it on.
So, let's limit this field to a max of 2.
# form.yml
attributes:
bc_num_slots:
max: 2
That's it! Again, because bc_num_slots
is convenience field, it already has a minimum of 1
that you can't override, because it doesn't make sense to specify 0 or less cores.
script.native
attributes are way for us to specify any arguments to the schedulers that
we can't pre-define or have a good generic defining (like queue_name
above).
In this section we're going to put make OnDemand request memory through the sbatch's
--mem
argument.
First, let's add it to the form like so.
Here are descriptions of all the fields we applied to it. Note if the label was not not defined the default 'Memory' would have been OK. Also we don't really need the the help message here, it was really just for illustration.
widget
specifies the type of widget to be a numbermax
the maximum value, ~1 GB in this casemin
the minimum value, 200 MBstep
the step size when users increase or decrease the valuevalue
the default value of 500 MBlabel
the for UIs labelhelp
a help message
# form.yml, with only this addition for brevity
attributes:
memory:
widget: "number_field"
max: 1000
min: 200
step: 200
value: 500
label: "Memory (MB)"
help: "RSS Memory"
form:
- memory
Again, now to actually use the value we populate in the form, we need to use
it in the submit.yml.erb
. This is where script.native
attributes come in.
# script.erb.yml
script:
native:
- "--mem"
- "<%= memory %>M"
Native attributes are an array and they're passed to the schedule just as they're defined here.
This would translate into a command much like: sbatch --mem 800M
. As you can see
native allows us to pass anything we wish into the scheduler command.
To confirm your job is running with the correct memory parameters, simply login to a shell and run the command below. You should see output like this.
[hpcadmin@ondemand /]$ squeue -o "%j %m"
NAME MIN_MEMORY
sys/dashboard/dev/jupyter 800M
At this point, this should be the entirety of the script.yml.erb
and form.yml
(without comments).
They're given here in full if you want to copy/paste them. And remember to save your spot!
# script.erb.yml
---
script:
queue_name: "<%= custom_queue %>"
native:
- "--mem"
- "<%= memory %>M"
# form.yml
cluster: "hpc"
attributes:
modules: "python"
extra_jupyter_args: ""
custom_queue:
widget: "select"
label: "Partition"
options:
- ["Compute", "compute"]
- ["Debug", "debug"]
bc_num_slots:
max: 2
memory:
widget: "number_field"
max: 1000
min: 200
step: 200
value: 500
label: "Memory (MB)"
help: "RSS Memory"
form:
- modules
- extra_jupyter_args
- bc_account
- custom_queue
- bc_num_hours
- bc_num_slots
- bc_email_on_started
- memory
Jupyter ships with both Notebooks and JupyterLab. Some users may want to use JuypterLab instead of Notebooks, so let's give them that option.
First, let's add the checkbox to the form.
# form.yml, with only this addition for brevity
attributes:
jupyterlab_switch:
widget: "check_box"
label: "Use JupyterLab instead of Jupyter Notebook?"
help: |
JupyterLab is the next generation of Jupyter, and is completely compatible with existing Jupyter Notebooks.
form:
- jupyterlab_switch
Refresh the new session form and you should now see your updates.
For this change, there's no need to edit the submit.yml.erb
. This toggle happens in the
actual script that's ran during the job, so we have to edit template.sh.erb
. Note that
this is also an ERB script, so it gets templated in Ruby before being submitted to the
scheduler.
Line 31 is as follows:
jupyter notebook --config="${CONFIG_FILE}" <%= context.extra_jupyter_args %>
Replace the notebook
parameter with this new toggle.
jupyter <%= context.jupyterlab_switch == "1" ? "lab" : "notebook" %> --config="${CONFIG_FILE}" <%= context.extra_jupyter_args %>
If you're unfamiliar with Ruby ternary statements, you can read it them like
this: if true ? do this : else do that
. So this reads, if context.jupyterlab_switch is 1 use lab, else use notebook
.
Also note the use of context
here where we didn't have to use that in the submit.yml.erb
.
This is an important difference. To reference variables from the form in the template/*.sh.erb
files
you must reference them through the context
object.
Now you can submit the job with the checked box to use JupyterLab instead of Notebook and you can see the Jupyter UI is significantly different.
At this point, this should be the entirety of the form.yml
(without comments).
They're given here in full if you want to copy/paste them. And remember to save your spot!
# form.yml
cluster: "hpc"
attributes:
modules: "python"
extra_jupyter_args: ""
custom_queue:
widget: "select"
label: "Partition"
options:
- ["Compute", "compute"]
- ["Debug", "debug"]
bc_num_slots:
max: 2
memory:
widget: "number_field"
max: 1000
min: 200
step: 200
value: 500
label: "Memory (MB)"
help: "RSS Memory"
jupyterlab_switch:
widget: "check_box"
label: "Use JupyterLab instead of Jupyter Notebook?"
help: |
JupyterLab is the next generation of Jupyter, and is completely compatible with existing Jupyter Notebooks.
form:
form:
- modules
- extra_jupyter_args
- bc_account
- custom_queue
- bc_num_hours
- bc_num_slots
- bc_email_on_started
- memory
- jupyterlab_switch
Now we're ready to deploy to production, let's clean up the form a little bit.
We want to remove some items because they're in the example for a real site, but for containers, they just don't apply.
Let's remove these items from the form. Note you'll also have to remove modules
and
extra_jupyter_args
from the attributes section too.
modules
becuase modules don't exist on these compute nodesextra_jupyter_args
because we're not passing anybc_account
because only 1 account is applied to each user, so there's no need to change it.bc_email_on_started
because containers can't email these fake users
Since we got rid of extra_jupyter_args
and modules
, we'll also have them remove it from the
template/script.sh.erb
as well.
Remove lines 13-22 to get rid of modules. And extra_jupyter_args is on line 29 of template/script.sh.erb
.
# remove this block from the 'unless' on line 13 to the 'end' at line 22.
<%- unless context.modules.blank? -%>
# Purge the module environment to avoid conflicts
module purge
# Load the require modules
module load <%= context.modules %>
# List loaded modules
module list
<%- end -%>
# ...
# and remove the last parameter given to jupyter on line 31
jupyter <%= context.jupyterlab_switch == "1" ? "lab" : "notebook" %> --config="${CONFIG_FILE}" <%= context.extra_jupyter_args %>
Now it should look like this:
jupyter <%= context.jupyterlab_switch == "1" ? "lab" : "notebook" %> --config="${CONFIG_FILE}"
At this point, this should be the entirety of the template/script.sh.erb
and form.yml
(without comments).
They're given here in full if you want to copy/paste them. And remember to save your spot!
#!/usr/bin/env bash
# Benchmark info
echo "TIMING - Starting main script at: $(date)"
# Set working directory to home directory
cd "${HOME}"
#
# Start Jupyter Notebook Server
#
# Benchmark info
echo "TIMING - Starting jupyter at: $(date)"
source /usr/local/jupyter/2.1.4/bin/activate
# Launch the Jupyter Notebook Server
set -x
jupyter <%= context.jupyterlab_switch == "1" ? "lab" : "notebook" %> --config="${CONFIG_FILE}"
# form.yml
cluster: "hpc"
attributes:
custom_queue:
widget: "select"
label: "Partition"
options:
- ["Compute", "compute"]
- ["Debug", "debug"]
bc_num_slots:
max: 2
memory:
widget: "number_field"
max: 1000
min: 200
step: 200
value: 500
label: "Memory (MB)"
help: "RSS Memory"
jupyterlab_switch:
widget: "check_box"
label: "Use JupyterLab instead of Jupyter Notebook?"
help: |
JupyterLab is the next generation of Jupyter, and is completely compatible with existing Jupyter Notebooks.
form:
- custom_queue
- bc_num_hours
- bc_num_slots
- memory
- jupyterlab_switch
The OnDemand UI pulls things from the manifest.yml
like the title of the application and where to
put it in the column of interactive applications.
Let's change the these fields. You can change any field except for role
. And you can change
them to something different than what's given here (have fun with it!). All fields besides role
are purely descriptive or relate to UI groups so we can freely change them without any behavior change.
Conversely, role
needs to be batch_connect
so don't change this!
---
# change the name, this is what shows up in the menu
name: HPC Tutorial Jupyter
# change the category just to differentiate from the system installed
# deskop application
category: Tutorial Apps
# change the subcategory
subcategory: Machine Learning
role: batch_connect
# change the description, this shows up when you hover over the menu item
description: |
This app will launch a Jupyter Lab or Notebootk on one or more nodes.
If you want to change category
and subcategory
you can freely do so.
These attributes create groupings for applications. Since we will only have two
applications (the system installed "Interactive Apps/Desktops" and this app)
Now save your spot because the next thing we're going to do is deploy this development application to production.
Deploying to production is as easy as copying the files from your dev directory to the system's app directory.
If you don't already have a shell session get a shell session now and execute these commands.
ssh ondemand
cd ~/ondemand/dev
sudo cp -R jupyter/ /var/www/ood/apps/sys/
And that's it! All you have to do now is refresh the page and you should see your Jupyter system app in the menu along with your sandbox development app.
Next - Acknowledgments Previous Step - Open XDMoD Back to Start