Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call Program before Jupyter Hub Launch #29

Open
AndiH opened this issue Nov 4, 2019 · 10 comments
Open

Call Program before Jupyter Hub Launch #29

AndiH opened this issue Nov 4, 2019 · 10 comments

Comments

@AndiH
Copy link

AndiH commented Nov 4, 2019

Disclaimer: I'm an absolute OpenShift newbee, but want to use the JupyterHub Quickstart for a HPC tutorial soon.

Is it possible to execute a Program (/Bash script) before JupyterHub is launched? I need to set up environment variables and move things around before the Notebook is started.

jupyterhub_config.sh seems to be intended for Shell commands, but I don't know how to use the file. There's also a corresponding entry in the configmap, but I don't know how to use it (and am not sure if this is really intended to be used for this kind of thing).

@GrahamDumpleton
Copy link
Contributor

Sorry for delay on this. Slipped out of the top of the inbox very quickly as been a busy week.

Can you clarify whether you want these steps to run inside of the pod for JupyterHub, or in the pod for each users Jupyter notebook instance?

@AndiH
Copy link
Author

AndiH commented Nov 13, 2019

My use case is to target users to certain directories (via environment variables) and run some setup procedures.

I think the pod for each Jupyter Notebook instance would be the correct place! (If there would be a similar hook for the JupyterHub pod, it would a good addition as well, I think.)

@GrahamDumpleton
Copy link
Contributor

Are the directories for storage?

One way is you would mount a sub directory for the user from a shared persistent volume, rather than mount the whole persistent volume and then place them in a specific directory. If you were to do the latter, they could see and modify other peoples files.

For an example of this scheme if only want to use a single persistent volume for all users, as opposed to a persistent volume per user, see:

In particular the JupyterHub config at:

@AndiH
Copy link
Author

AndiH commented Nov 13, 2019

Thanks for the hints!

It's a good idea to mount user-specific sub-directories! I'll have a look into it!

In general our setup is even a bit more complicated: We are running HPC jobs through a batch submission system launched from Notebooks. The shared filesystem is mounted into the pod – and only this file system can be accessed from the submitted job. So, users would be able to escape their sub-directory (via the backend), if they really wanted to – but still, I consider mounting sub-directories a good idea.

@GrahamDumpleton
Copy link
Contributor

GrahamDumpleton commented Nov 13, 2019

The Jupyter notebook images in this GitHub org also support an environment variable JUPYTER_WORKSPACE_NAME which if set will cause the file browser to start on a sub directory. It only works for classic notebook interface though, not JupyterLab interface.

The changes I made a few hours back related to jupyter-on-openshift/jupyter-notebooks#16 would allow you to supply a shell script which is run during start up sequence. Theoretically it could read an environment variable and change the working directory before starting the notebook. That shell script needs to be stored at .jupyter/jupyter_notebook_config.sh in any custom notebook image. You need to be using version 2.4.1 or later of the notebook images as base.

@AndiH
Copy link
Author

AndiH commented Nov 13, 2019

Thank you!

I'll need some time to digest this and try it out!

@GrahamDumpleton
Copy link
Contributor

Keep me in the loop of what you are trying to do. There is all sorts of ways you can adapt JupyterHub and I am working on some new built in configuration options. One will provide play pens for users where have authentication and cluster access from the notebook to deploy extra stuff. Another will be a test lab environment where when user requests selected notebook, additional workloads can be deployed into a linked project on demand for what may be required by the notebook. So you could for example deploy a Dask or Spark cluster automatically on startup of session the first time.

@AndiH
Copy link
Author

AndiH commented Nov 13, 2019

Both things sound really good. But especially the sample project from Singapore NTU looks very intersting.

We set up our Notebooks the following: Login to HPC system via SSH (with a forwarded port), load the environment you need, start juptyer lab, connected to forwarded port. And that is what we wanted to re-create with OpenShift such that users don't come in contact with SSH and port-forwarding (error-prone…).
Unfortunately our tutorial is next Monday, so I fear we won't have everything in place until then.

@GrahamDumpleton
Copy link
Contributor

If you want to hop on a video chat session to discuss options to try and speed things up let me know. Been doing various Jupyter stuff this last week so I am in the right frame of mind to help out if I can.

@AndiH
Copy link
Author

AndiH commented Nov 13, 2019

I'd love to! Can I contact you somewhere privately? I've just added you on Twitter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants