In this mode, the files are transferred by Rucio to a storage mounted to the JupyterLab server. In order to use the extension in this mode, you need to have the following set up:
- A JupyterLab version 2 installation.
- At least one Rucio instance.
- A storage system that is attached to the JupyterLab installation via FUSE.
- The storage system should be compatible with Rucio and added as a Rucio Storage Element.
- The storage element will be shared among multiple users, so be sure to allow all users who will be using the extension to have read permission to the path.
- It's recommended that quotas be disabled, since the extension does not care if the replication fails because of quota error.
In this mode, the extension downloads the files to the user directory. This is used when your JupyterLab installation does not use a storage as an RSE. To use the extension in this mode, you need to have the following set up:
- A JupyterLab version 2 installation.
- At least one Rucio instance.
- Rucio Storage Elements (RSE) that are accessible from the notebook server, with no authentication scheme.
The extension can be configured locally or remotely.
In your Jupyter configuration (could be ~/.jupyter/jupyter_notebook_config.json
), add the following snippet:
{
"RucioConfig": {
"instances": [
{
"name": "experiment.cern.ch",
"display_name": "Experiment",
"rucio_base_url": "https://rucio",
"rucio_auth_url": "https://rucio",
"rucio_ca_cert": "/path/to/rucio_ca.pem",
"destination_rse": "SWAN-EOS",
"rse_mount_path": "/eos/rucio",
"path_begins_at": 4,
"mode": "replica"
}
]
}
}
To use remote configuration, use the following snippet:
{
"RucioConfig": {
"instances": [
{
"name": "experiment.cern.ch",
"display_name": "Experiment",
"$url": "https://url-to-rucio-configuration/config.json"
}
]
}
}
In the JSON file pointed by the value in $url
, use the following snippet:
{
"rucio_base_url": "https://rucio",
"destination_rse": "SWAN-EOS",
"rucio_auth_url": "https://rucio",
"rucio_ca_cert": "/path/to/rucio_ca.pem",
"rse_mount_path": "/eos/rucio",
"path_begins_at": 4,
"mode": "replica"
}
Attributes name
, display_name
, and mode
must be defined locally, while the rest can be defined remotely. If an attribute is defined in both local and remote configuration, the local one is used.
A unique name to identify Rucio instance, should be machine readable. It is recommended to use FQDN. Must be declared locally.
Example: atlas.cern.ch
, cms.cern.ch
A name that will be displayed to users in the interface. Must be declared locally.
Example: ATLAS
, CMS
The mode in which the extension operates. Must be declared locally.
- Replica mode (
replica
) - Download mode (
download
)
Base URL for the Rucio instance accessible from the JupyterLab server, without trailing slash.
Example: https://rucio
Base URL for the Rucio instance handling authentication (if separate) accessible from the JupyterLab server, without trailing slash.
Example: https://rucio-auth
Path to Rucio server certificate file, accessible via filesystem mount. Optional in Replica mode, mandatory in Download mode.
Example: /opt/rucio/rucio_ca.pem
Rucio App ID. Optional.
Example: swan
Site name of the JupyterLab instance, optional. It allows Rucio to know whether to serve a proxied PFN or not.
Example: ATLAS
VO of the instance. Optional, for use in multi-VO installations only. If VOMS is enabled, this value will be supplied as --voms
option when invoking voms-proxy-init
.
Example: def
The extension uses voms-proxy-init
to generate a Proxy certificate when downloading a file from an authenticated RSE.
If set to true
, vo
option is specified, and X.509 User Certificate is used, the extension will invoke voms-proxy-init
with the --voms
argument set to the extension's vo
option.
Optional, with default: false
If VOMS is enabled and this configuration is set, the extension will set the --certdir
option with this value. Refer to voms-proxy-init
documentation.
Example: /etc/grid-security/certificates
If VOMS is enabled and this configuration is set, the extension will set the --vomsdir
option with this value. Refer to voms-proxy-init
documentation.
Example: /etc/grid-security/vomsdir
WARNING: In earlier versions, voms-proxy-init
does not support the --vomsdir
option. In that case, this option must be omitted.
If VOMS is enabled and this configuration is set, the extension will set the --vomses
option with this value. Refer to voms-proxy-init
documentation.
Example: /etc/vomses
WARNING: In earlier versions, voms-proxy-init
does not support the --vomsdir
option. In that case, this option must be omitted.
The name of the Rucio Storage Element that is mounted to the JupyterLab server. Mandatory, only applicable in Replica mode.
Example: SWAN-EOS
The base path in which the RSE is mounted to the server. Mandatory, only applicable in Replica mode.
Example: /eos/rucio
This configuration indicates which part of the PFN should be appended to the mount path. Only applicable in Replica mode. Defaults to 0
.
Example: let us say that the PFN of a file is root://xrd1:1094//rucio/test/49/ad/f1.txt
and the mount path is /eos/rucio
. A starting index of 1
means that the path starting from the 2nd slash (index 1) in the PFN will be appended to the mount path. The resulting path would be /eos/rucio/test/49/ad/f1.txt
.
Replication rule lifetime in days. Optional, only applicable in Replica mode.
Example: 365
Whether or not wildcard DID search is allowed. Optional, defaults to false
.
Specifies where should the extension gets the OIDC token from. Optional, the value should be file
or env
.
Specifies an absolute path to a file containing the OIDC access token.
Specifies the environment variable name containing the OIDC access token.
To allow users to access the paths from within the notebook, a kernel extension must be enabled. The kernel resides in module rucio_jupyterlab.kernels.ipython
.
To enable the kernel extension from inside a notebook, use load_ext
IPython magic:
%load_ext rucio_jupyterlab.kernels.ipython
Or, if you want to enable it by default, put the following snippet in your IPython configuration (could be ~/.ipython/profile_default/ipython_kernel_config.py
).
c.IPKernelApp.extensions = ['rucio_jupyterlab.kernels.ipython']
Unlike the other authentication methods supported by the extension, which is configurable by users only, OIDC auth should be configured by the admins. Users won't see "OpenID Connect" option if OIDC auth is not configured properly.
This extension does not provide a way for users to authenticate directly from the extension. Instead, the OIDC token must be obtained from an external mechanism.
In a multi-user setup with JupyterHub, admins must make the OIDC token accessible from the single user's container via either a file or an environment variable. Then, they need to configure the oidc_auth
and oidc_env_name
or oidc_file_name
parameters (see above).
Furthermore, the JupyterHub installation must have a mechanism of periodically refreshing the OIDC token so that an expired token is not used.
See the following sections for more details.
The Docker image for the single-user container must include the Rucio extension and the defined OpenID Connect variables. To install the extension, refer to the provided Dockerfile. Additionally, the configure.py script takes care of writing the variables from the environment to the Jupyter configuration.
JupyterHub installation is possible through the use of the Helm Chart provided by Zero to JupyterHub with Kubernetes. In order to enable the Rucio extension, add the following customisation to the values.
- Add the custom in
singleuser.image
:
singleuser:
image: <image-url>:<image-tag>
- Add a custom authentication script to
hub.extraConfig
. For instance, label it astoken-exchange
and append the script in this format:
hub:
extraConfig:
token-exchange: |
import pprint
import os
import warnings
import requests
from oauthenticator.generic import GenericOAuthenticator
# custom authenticator to enable auth_state and get access token to set as env var for rucio extension
class RucioAuthenticator(GenericOAuthenticator):
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.enable_auth_state = True
def exchange_token(self, token):
params = {
'client_id': self.client_id,
'client_secret': self.client_secret,
'grant_type': 'urn:ietf:params:oauth:grant-type:token-exchange',
'subject_token': token,
'scope': 'openid profile',
'audience': 'rucio'
}
response = requests.post(self.token_url, data=params)
rucio_token = response.json()['access_token']
return rucio_token
async def pre_spawn_start(self, user, spawner):
auth_state = await user.get_auth_state()
pprint.pprint(auth_state)
if not auth_state:
# user has no auth state
return
# define token environment variable from auth_state
spawner.environment['RUCIO_ACCESS_TOKEN'] = self.exchange_token(auth_state['access_token'])
spawner.environment['EOS_ACCESS_TOKEN'] = auth_state['access_token']
# set the above authenticator as the default
c.JupyterHub.authenticator_class = RucioAuthenticator
# enable authentication state
c.GenericOAuthenticator.enable_auth_state = True
if 'JUPYTERHUB_CRYPT_KEY' not in os.environ:
warnings.warn(
"Need JUPYTERHUB_CRYPT_KEY env for persistent auth_state.\n"
" export JUPYTERHUB_CRYPT_KEY=$(openssl rand -hex 32)"
)
c.CryptKeeper.keys = [os.urandom(32)]
- Add the configuration parameters for the custom authenticator to the
hub.config
:
hub:
config:
RucioAuthenticator:
client_id: <your-client-id>
client_secret: <your-client-secret>
authorize_url: <your-auth-url>
token_url: <your-token-url>
userdata_url: <your-userinfo-url>
username_key: preferred_username
scope:
- openid
- profile
- email
- Add the required extension parameters to the
singleuser.extraEnv
:
singleuser:
extraEnv:
RUCIO_MODE: "replica"
RUCIO_WILDCARD_ENABLED: "1"
RUCIO_BASE_URL: "<your-rucio-url>"
RUCIO_AUTH_URL: "<your-rucio-auth-url>"
RUCIO_WEBUI_URL: "<your-rucio-ui-url>"
RUCIO_DISPLAY_NAME: "<your-rucio-instance-display-name>"
RUCIO_NAME: "<your-rucio-instance-name>"
RUCIO_SITE_NAME: "<your-rucio-instance-site-name>"
RUCIO_OIDC_AUTH: "env"
RUCIO_OIDC_ENV_NAME: "RUCIO_ACCESS_TOKEN"
RUCIO_DEFAULT_AUTH_TYPE: "oidc"
RUCIO_OAUTH_ID: "<your-rucio-oauth-id>" # audience
RUCIO_DEFAULT_INSTANCE: "<your-rucio-instance-name>""
RUCIO_DESTINATION_RSE: "EOS RSE"
RUCIO_RSE_MOUNT_PATH: "/eos/eos-rse"
RUCIO_PATH_BEGINS_AT: "4"
RUCIO_CA_CERT: "<your-rucio-ca-file-path>"
OAUTH2_TOKEN: "FILE:/tmp/eos_oauth.token"
- Build the Docker image and install the Helm Chart with the specified values.
Note: This configuration works in replica mode and maps an EOS RSE as the target RSE, which is FUSE mounted on the nodes where Jupyterhub is running.