Merge branch 'main' into dev

nesi · Apr 28, 2024 · a721306 · a721306
2 parents f7049ce + f384f3d
commit a721306
Show file tree

Hide file tree

Showing 5 changed files with 78 additions and 3 deletions.
diff --git a/deployment-checklist.md b/deployment-checklist.md
@@ -5,9 +5,9 @@ In *vars/ondemand-config.yml.example*:
 - adjust `num_users_create` and `num_trainers_create`
 - adjust `ood_apps`
   - check `version` and `k8s_container`
-  - enable required apps
-  - set which images to pre-pull (don't choose too many, we have limited space currently on the worker nodes and pre-pulling will fail if you exhaust it)
-- set `enable_pod_prepull` if desired (should default to on probably, sometimes we have experience really slow pulls, this helps with that)
+  - enable required apps (usually just leave them all enabled, except for containers)
+  - set which images to pre-pull (just choose the one you will be using, we have limited space currently on the worker nodes and pre-pulling will fail if you exhaust it)
+- set `enable_pod_prepull` to "true" (sometimes we have experienced really slow image pulls, this helps with that)
 - set `control_plane_flavor`, usually to `balanced1.4cpu8ram` for production
 - set `cluster_worker_count` and `worker_flavor` to have enough capacity for the number of users
 

diff --git a/docs/known-issues.md b/docs/known-issues.md
@@ -0,0 +1,12 @@
+# Known issues and limitations
+
+## Timeouts on login page
+
+If you leave the login page open for a while without logging in you may encounter a timeout error when you do eventually login.
+Opening a new tab and navigating to the webnode URL should fix this. It would be a good idea to login immediately after opening the training environment URL until we have fixed this timeout issue.
+
+## Sharing the link to the training environment
+
+Make sure you only share webnode URL, which will look like "https://*name*-ood-**webnode**.data.nesi.org.nz".
+
+**Do not** share the services URL (which is the URL you see on the login screen) that looks like "https://*name*-ood-**services**.data.nesi.org.nz". If you share link it will almost certainly not work for others.
diff --git a/docs/notes-for-trainers.md b/docs/notes-for-trainers.md
@@ -55,3 +55,7 @@ nesi-get-pods | grep user-
 ```
 
 will show both training and trainer pods, including those in all states (not just running).
+
+## Known issues and limitations
+
+See [known issues](known-issues.md)
diff --git a/docs/tutorials/jupyterlab-app-for-intermediate-shell-for-bioinformatics.md b/docs/tutorials/jupyterlab-app-for-intermediate-shell-for-bioinformatics.md
@@ -143,6 +143,64 @@ description: |
 
 ## submit.yml.erb
 
+Most of the configuration for the app happens in this file. It will look quite different depending on the type of app, e.g. JupyterLab vs RStudio. Notice the *.erb* extension which means this file will run through the ERB template engine.
+
+The top section of the file (above the "---") should not need to be changed. In this section we set some ruby variables that are used in the templates later on, e.g. a reference to the current user, the IP address of the services node, etc.
+
+```yaml
+<%
+   pwd_cfg = "c.ServerApp.password=u\'sha1:${SALT}:${PASSWORD_SHA1}\'"
+   host_port_cfg = "c.ServerApp.base_url=\'/node/${HOST_CFG}/${PORT_CFG}/\'"
+
+   configmap_filename = "ondemand_config.py"
+   configmap_data = "c.NotebookApp.port = 8080"
+   utility_img = "ghcr.io/nesi/training-environment-k8s-utils:v0.1.0"
+
+   user = OodSupport::User.new
+
+   services_node = Resolv.getaddress("servicesnode")
+%>
+---
+```
+
+The config happens in the `script:` entry, usually the `accounting_id` and `wall_time` should not need to change
+
+```yaml
+script:
+  accounting_id: "<%= account %>"
+  wall_time: "<%= wall_time.to_i * 3600 %>"
+```
+
+Inside the `native:` entry is where we configure the pod that will run the app on the kubernetes cluster, for example we define the container with:
+
+```yaml
+  native:
+    container:
+      name: "intermshell"
+      image: "ghcr.io/nesi/training-environment-jupyter-intermediate-shell-app:v0.3.3"
+      command: ["/bin/bash","-l","<%= staged_root %>/job_script_content.sh"]
+      working_dir: "<%= Etc.getpwnam(ENV['USER']).dir %>"
+      restart_policy: 'OnFailure'
+```
+
+- Using a unique `name` is a good idea to tell the running apps apart
+- `image` should point the docker image that should be used
+- `command` should not be changed
+- `working_dir` is usually left as the home directory but could be changed
+- `restart_policy` is usually left the same
+
+!!! info "Note about developing apps"
+
+    When developing an app it can be useful to set the image tag to point to a branch name and to set the `image_pull_policy` to alway, for example:
+
+    ```yaml
+    image: "ghcr.io/nesi/training-environment-jupyter-intermediate-shell-app:dev"
+    image_pull_policy: "Always"
+    ```
+
+    This way, whenever you push a change to the *dev* branch in your app repo, it will rebuild the docker image with the *dev* tag and then you can just restart your app in the training environment to pick up the changes. Do not do this in production though, pulling images can be slow.
+
+UNFINISHED
 
 ## view.html.erb
 

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -21,6 +21,7 @@ nav:
     - "Example: RStudio based app for RNA-Seq": 'apps/example-rstudio-rna-seq.md'
   - Deployment: 'deployment.md'
   - Notes for trainers: 'notes-for-trainers.md'
+  - Known issues: 'known-issues.md'
   - Troubleshooting: 'troubleshooting.md'
 
 theme: