-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adds the supplement and updates the conclusions
- Loading branch information
Showing
2 changed files
with
325 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
312 changes: 312 additions & 0 deletions
312
2024-RIKEN-AWS/JupyterNotebook/tutorial/notebook/06_supplement.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,312 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Module 6: Supplement\n", | ||
"\n", | ||
"This extra module covers various other aspects of Flux that we did not get to in this tutorial. Feel free to try thing out and play around!" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Flux uptime\n", | ||
"Flux provides an `uptime` utility to display properties of the Flux instance such as state of the current instance, how long it has been running, its size and if scheduling is disabled or stopped. The output shows how long the instance has been up, the instance owner, the instance depth (depth in the Flux hierarchy), and the size of the instance (number of brokers)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Flux Process and Job Utilities\n", | ||
"### Flux top\n", | ||
"Flux provides a feature-full version of `top` for nested Flux instances and jobs. In the JupyterLab terminal, invoke `flux top` to see the \"sleep\" jobs. If they have already completed you can resubmit them. \n", | ||
"\n", | ||
"We recommend not running `flux top` in the notebook as it is not designed to display output from a command that runs continuously.\n", | ||
"\n", | ||
"### Flux pstree\n", | ||
"In analogy to `top`, Flux provides `flux pstree`. Try it out in the JupyterLab terminal or here in the notebook.\n", | ||
"\n", | ||
"### Flux proxy\n", | ||
"\n", | ||
"#### Interacting with a job hierarchy with `flux proxy`\n", | ||
"\n", | ||
"Flux proxy is used to route messages to and from a Flux instance. We can use `flux proxy` to connect to a running Flux instance and then submit more nested jobs inside it. You may want to edit `sleep_batch.sh` with the JupyterLab text editor (double click the file in the window on the left) to sleep for `60` or `120` seconds. Then from the JupyterLab terminal, run, you'll want to run the below. Yes, we really want you to open a terminal in the Jupyter launcher <span style=\"color:red\">FILE-> NEW -> TERMINAL</span> and run the commands below!" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"```bash\n", | ||
"# The terminal will start at the root, ensure you are in the right spot!\n", | ||
"# jovyan - that's you! \n", | ||
"cd /home/jovyan/flux-radiuss-tutorial-2023/notebook/\n", | ||
"\n", | ||
"# Outputs the JOBID\n", | ||
"flux batch --nslots=2 --cores-per-slot=1 --nodes=2 ./sleep_batch.sh\n", | ||
"\n", | ||
"# Put the JOBID into an environment variable\n", | ||
"JOBID=$(flux job last)\n", | ||
"\n", | ||
"# See the flux process tree\n", | ||
"flux pstree -a\n", | ||
"\n", | ||
"# Connect to the Flux instance corresponding to JOBID above\n", | ||
"flux proxy ${JOBID}\n", | ||
"\n", | ||
"# Note the depth is now 1 and the size is 2: we're one level deeper in a Flux hierarchy and we have only 2 brokers now.\n", | ||
"flux uptime\n", | ||
"\n", | ||
"# This instance has 2 \"nodes\" and 2 cores allocated to it\n", | ||
"flux resource list\n", | ||
"\n", | ||
"# Have you used the top command in your terminal? We have one for flux!\n", | ||
"flux top\n", | ||
"```\n", | ||
"\n", | ||
"`flux top` was pretty cool, right? 😎️" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Submission API\n", | ||
"Flux also provides first-class python bindings which can be used to submit jobs programmatically. The following script shows this with the `flux.job.submit()` call:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import os\n", | ||
"import json\n", | ||
"import flux\n", | ||
"from flux.job import JobspecV1\n", | ||
"from flux.job.JobID import JobID" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"f = flux.Flux() # connect to the running Flux instance\n", | ||
"compute_jobreq = JobspecV1.from_command(\n", | ||
" command=[\"./compute.py\", \"120\"], num_tasks=1, num_nodes=1, cores_per_task=1\n", | ||
") # construct a jobspec\n", | ||
"compute_jobreq.cwd = os.path.expanduser(\"~/flux-tutorial/flux-workflow-examples/job-submit-api/\") # set the CWD\n", | ||
"print(JobID(flux.job.submit(f,compute_jobreq)).f58) # submit and print out the jobid (in f58 format)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### `flux.job.get_job(handle, jobid)` to get job info" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# This is a new command to get info about your job from the id!\n", | ||
"fluxjob = flux.job.submit(f,compute_jobreq)\n", | ||
"fluxjobid = JobID(fluxjob.f58)\n", | ||
"print(f\"🎉️ Hooray, we just submitted {fluxjobid}!\")\n", | ||
"\n", | ||
"# Here is how to get your info. The first argument is the flux handle, then the jobid\n", | ||
"jobinfo = flux.job.get_job(f, fluxjobid)\n", | ||
"print(json.dumps(jobinfo, indent=4))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!flux jobs -a | grep compute" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Under the hood, the `Jobspec` class is creating a YAML document that ultimately gets serialized as JSON and sent to Flux for ingestion, validation, queueing, scheduling, and eventually execution. We can dump the raw JSON jobspec that is submitted, where we can see the exact resources requested and the task set to be executed on those resources." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"print(compute_jobreq.dumps(indent=2))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We can then replicate our previous example of submitting multiple heterogeneous jobs and testing that Flux co-schedules them." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"compute_jobreq = JobspecV1.from_command(\n", | ||
" command=[\"./compute.py\", \"120\"], num_tasks=4, num_nodes=2, cores_per_task=2\n", | ||
")\n", | ||
"compute_jobreq.cwd = os.path.expanduser(\"~/flux-tutorial/flux-workflow-examples/job-submit-api/\")\n", | ||
"print(JobID(flux.job.submit(f, compute_jobreq)))\n", | ||
"\n", | ||
"io_jobreq = JobspecV1.from_command(\n", | ||
" command=[\"./io-forwarding.py\", \"120\"], num_tasks=1, num_nodes=1, cores_per_task=1\n", | ||
")\n", | ||
"io_jobreq.cwd = os.path.expanduser(\"~/flux-tutorial/flux-workflow-examples/job-submit-api/\")\n", | ||
"print(JobID(flux.job.submit(f, io_jobreq)))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!flux jobs -a | grep compute" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We can use the FluxExecutor class to submit large numbers of jobs to Flux. This method uses python's `concurrent.futures` interface. Example snippet from `~/flux-workflow-examples/async-bulk-job-submit/bulksubmit_executor.py`:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"``` python \n", | ||
"with FluxExecutor() as executor:\n", | ||
" compute_jobspec = JobspecV1.from_command(args.command)\n", | ||
" futures = [executor.submit(compute_jobspec) for _ in range(args.njobs)]\n", | ||
" # wait for the jobid for each job, as a proxy for the job being submitted\n", | ||
" for fut in futures:\n", | ||
" fut.jobid()\n", | ||
" # all jobs submitted - print timings\n", | ||
"```" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Submit a FluxExecutor based script.\n", | ||
"%run ../flux-workflow-examples/async-bulk-job-submit/bulksubmit_executor.py -n200 /bin/sleep 0" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Diving Deeper Into Flux's Internals\n", | ||
"\n", | ||
"Flux uses [hwloc](https://github.com/open-mpi/hwloc) to detect the resources on each node and then to populate its resource graph.\n", | ||
"\n", | ||
"You can access the topology information that Flux collects with the `flux resource` subcommand:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!flux resource list" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Flux can also bootstrap its resource graph based on static input files, like in the case of a multi-user system instance setup by site administrators. [More information on Flux's static resource configuration files](https://flux-framework.readthedocs.io/en/latest/adminguide.html#resource-configuration). Flux provides a more standard interface to listing available resources that works regardless of the resource input source: `flux resource`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# To view status of resources\n", | ||
"!flux resource status" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Flux has a command for controlling the queue within the `job-manager`: `flux queue`. This includes disabling job submission, re-enabling it, waiting for the queue to become idle or empty, and checking the queue status:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!flux queue disable \"maintenance outage\"\n", | ||
"!flux queue enable\n", | ||
"!flux queue -h" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Each Flux instance has a set of attributes that are set at startup that affect the operation of Flux, such as `rank`, `size`, and `local-uri` (the Unix socket usable for communicating with Flux). Many of these attributes can be modified at runtime, such as `log-stderr-level` (1 logs only critical messages to stderr while 7 logs everything, including debug messages)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!flux getattr rank\n", | ||
"!flux getattr size\n", | ||
"!flux getattr local-uri\n", | ||
"!flux setattr log-stderr-level 3\n", | ||
"!flux lsattr -v" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"language_info": { | ||
"name": "python" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |