Skip to content

Commit

Permalink
Merge branch 'main' into dev
Browse files Browse the repository at this point in the history
  • Loading branch information
chrisdjscott committed Apr 28, 2024
2 parents f7049ce + f384f3d commit a721306
Show file tree
Hide file tree
Showing 5 changed files with 78 additions and 3 deletions.
6 changes: 3 additions & 3 deletions deployment-checklist.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ In *vars/ondemand-config.yml.example*:
- adjust `num_users_create` and `num_trainers_create`
- adjust `ood_apps`
- check `version` and `k8s_container`
- enable required apps
- set which images to pre-pull (don't choose too many, we have limited space currently on the worker nodes and pre-pulling will fail if you exhaust it)
- set `enable_pod_prepull` if desired (should default to on probably, sometimes we have experience really slow pulls, this helps with that)
- enable required apps (usually just leave them all enabled, except for containers)
- set which images to pre-pull (just choose the one you will be using, we have limited space currently on the worker nodes and pre-pulling will fail if you exhaust it)
- set `enable_pod_prepull` to "true" (sometimes we have experienced really slow image pulls, this helps with that)
- set `control_plane_flavor`, usually to `balanced1.4cpu8ram` for production
- set `cluster_worker_count` and `worker_flavor` to have enough capacity for the number of users

Expand Down
12 changes: 12 additions & 0 deletions docs/known-issues.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Known issues and limitations

## Timeouts on login page

If you leave the login page open for a while without logging in you may encounter a timeout error when you do eventually login.
Opening a new tab and navigating to the webnode URL should fix this. It would be a good idea to login immediately after opening the training environment URL until we have fixed this timeout issue.

## Sharing the link to the training environment

Make sure you only share webnode URL, which will look like "https://*name*-ood-**webnode**.data.nesi.org.nz".

**Do not** share the services URL (which is the URL you see on the login screen) that looks like "https://*name*-ood-**services**.data.nesi.org.nz". If you share link it will almost certainly not work for others.
4 changes: 4 additions & 0 deletions docs/notes-for-trainers.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,3 +55,7 @@ nesi-get-pods | grep user-
```

will show both training and trainer pods, including those in all states (not just running).

## Known issues and limitations

See [known issues](known-issues.md)
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,64 @@ description: |
## submit.yml.erb
Most of the configuration for the app happens in this file. It will look quite different depending on the type of app, e.g. JupyterLab vs RStudio. Notice the *.erb* extension which means this file will run through the ERB template engine.
The top section of the file (above the "---") should not need to be changed. In this section we set some ruby variables that are used in the templates later on, e.g. a reference to the current user, the IP address of the services node, etc.
```yaml
<%
pwd_cfg = "c.ServerApp.password=u\'sha1:${SALT}:${PASSWORD_SHA1}\'"
host_port_cfg = "c.ServerApp.base_url=\'/node/${HOST_CFG}/${PORT_CFG}/\'"
configmap_filename = "ondemand_config.py"
configmap_data = "c.NotebookApp.port = 8080"
utility_img = "ghcr.io/nesi/training-environment-k8s-utils:v0.1.0"
user = OodSupport::User.new
services_node = Resolv.getaddress("servicesnode")
%>
---
```

The config happens in the `script:` entry, usually the `accounting_id` and `wall_time` should not need to change

```yaml
script:
accounting_id: "<%= account %>"
wall_time: "<%= wall_time.to_i * 3600 %>"
```
Inside the `native:` entry is where we configure the pod that will run the app on the kubernetes cluster, for example we define the container with:

```yaml
native:
container:
name: "intermshell"
image: "ghcr.io/nesi/training-environment-jupyter-intermediate-shell-app:v0.3.3"
command: ["/bin/bash","-l","<%= staged_root %>/job_script_content.sh"]
working_dir: "<%= Etc.getpwnam(ENV['USER']).dir %>"
restart_policy: 'OnFailure'
```

- Using a unique `name` is a good idea to tell the running apps apart
- `image` should point the docker image that should be used
- `command` should not be changed
- `working_dir` is usually left as the home directory but could be changed
- `restart_policy` is usually left the same

!!! info "Note about developing apps"

When developing an app it can be useful to set the image tag to point to a branch name and to set the `image_pull_policy` to alway, for example:

```yaml
image: "ghcr.io/nesi/training-environment-jupyter-intermediate-shell-app:dev"
image_pull_policy: "Always"
```

This way, whenever you push a change to the *dev* branch in your app repo, it will rebuild the docker image with the *dev* tag and then you can just restart your app in the training environment to pick up the changes. Do not do this in production though, pulling images can be slow.

UNFINISHED

## view.html.erb

Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ nav:
- "Example: RStudio based app for RNA-Seq": 'apps/example-rstudio-rna-seq.md'
- Deployment: 'deployment.md'
- Notes for trainers: 'notes-for-trainers.md'
- Known issues: 'known-issues.md'
- Troubleshooting: 'troubleshooting.md'

theme:
Expand Down

0 comments on commit a721306

Please sign in to comment.