-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #109 from cbegeman/reorg-for-shared-steps
New shared steps capability design doc
- Loading branch information
Showing
2 changed files
with
381 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,5 +5,6 @@ | |
```{toctree} | ||
:titlesonly: true | ||
shared_steps | ||
template | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,380 @@ | ||
# Shared steps | ||
|
||
date: 2023/08/18 | ||
|
||
Contributors: Carolyn Begeman, Xylar Asay-Davis | ||
|
||
## Summary | ||
|
||
The capability designed here is the ability to share steps across tasks. | ||
In this design document, "shared steps" refers to any step which may be used by | ||
multiple tasks that are available in polaris. | ||
|
||
The main motivation behind this capability is the computational expense of | ||
running steps that could shared across tasks multiple times. In order to | ||
reflect the fact that steps are shared to the user, we present a new design for | ||
the working directory structure. The design is successful insofar as it | ||
guarantees that shared steps are run once per slurm job and that the role of | ||
shared steps is clear to users. | ||
|
||
## Requirements | ||
|
||
### Requirement: Shared steps are run once. | ||
|
||
Shared steps should be run once per invocation of `polaris serial` or | ||
`polaris run`. | ||
|
||
### Requirement: Shared steps are run before steps that depend on their output. | ||
|
||
### Requirement: Shared steps are not daughters of a task | ||
|
||
A shared step's class attributes do not include any task-related information | ||
such as a task it belongs to. | ||
|
||
### Requirement: Working directory structure is intuitive. | ||
|
||
Shared step directories should be located at the highest level in the working | ||
directory structure where all tasks that use that step are run at or below that | ||
level. | ||
|
||
### Requirement: Working directory step paths are easily discoverable by users. | ||
|
||
There should be a way to list the paths within the work directory of all steps | ||
in each task. There should also be a way for a user to find the steps | ||
in a task from the task's work directory. | ||
|
||
### Requirement: The output of shared steps may be used by multiple tasks. | ||
|
||
A step may only be shared across multiple tasks if its output would be | ||
identical for each task. | ||
|
||
### Requirement: tasks do not rely on outputs from steps in other tasks | ||
|
||
All tasks are self-contained and rely only on either shared steps or steps they | ||
contain. | ||
|
||
|
||
|
||
## Implementation | ||
|
||
### Implementation: Shared steps are set up once. | ||
|
||
As before, setup of either a list of tasks or a suite proceeds by iterating | ||
through the tasks and then through the steps in each task. An attribute | ||
`setup_complete` has been added to `Step` and is initialized to `False`. | ||
In the `setup_task()` function, setup is skipped for any steps where | ||
`step.setup_complete == True`, and this attribute is set to `True` when a step | ||
has been completed. | ||
|
||
### Implementation: Shared steps are run before steps that depend on their output. | ||
|
||
Requirement is already satisfied as part of task parallelism design, which | ||
makes use of file dependencies. When running in task-serial mode, the | ||
implementation will be to make sure shared steps are added to the dictionary of | ||
steps before other steps that rely on them. | ||
|
||
### Implementation: Shared steps are not daughters of a task | ||
|
||
The `task` attribute and constructor argument of the `Step` class has been | ||
replaced by the `component` attribute. The step's `subdir` attribute is now | ||
relative to the component's work directory, rather than a parent task's work | ||
directory. | ||
|
||
### Implementation: Working directory structure is intuitive. | ||
|
||
The only shared steps that reside inside of a task's work directory are in | ||
situations where another task also lies within the task's work directory. | ||
The only such tasks at the moment are the `cosine_bell/with_viz` tasks, which | ||
reside inside the `cosine_bell` tasks. The `cosine_bell/with_viz` tasks share | ||
all of the steps of the `cosine_bell` (base-mesh, init and forward for each | ||
resolution, and a single analysis step) and also add remapping and | ||
visualization steps that are not shared with any other tasks: | ||
|
||
`cosine_bell`: | ||
* ocean | ||
* spherical | ||
* qu | ||
* base_mesh | ||
* 60km | ||
* 90km | ||
* 120km | ||
* 150km | ||
* 180km | ||
* 210km | ||
* 240km | ||
* cosine_bell | ||
* init | ||
* 60km | ||
* 90km | ||
* 120km | ||
* 150km | ||
* 180km | ||
* 210km | ||
* 240km | ||
* forward | ||
* 60km | ||
* 90km | ||
* 120km | ||
* 150km | ||
* 180km | ||
* 210km | ||
* 240km | ||
* analysis | ||
|
||
`cosine_bell/with_viz`: | ||
* ocean | ||
* spherical | ||
* qu | ||
* base_mesh | ||
* 60km | ||
* 90km | ||
* 120km | ||
* 150km | ||
* 180km | ||
* 210km | ||
* 240km | ||
* cosine_bell | ||
* init | ||
* 60km | ||
* 90km | ||
* 120km | ||
* 150km | ||
* 180km | ||
* 210km | ||
* 240km | ||
* forward | ||
* 60km | ||
* 90km | ||
* 120km | ||
* 150km | ||
* 180km | ||
* 210km | ||
* 240km | ||
* analysis | ||
* with_viz | ||
* map | ||
* 60km | ||
* 90km | ||
* 120km | ||
* 150km | ||
* 180km | ||
* 210km | ||
* 240km | ||
* viz | ||
* 60km | ||
* 90km | ||
* 120km | ||
* 150km | ||
* 180km | ||
* 210km | ||
* 240km | ||
|
||
|
||
### Implementation: Working directory step paths are easily discoverable by users. | ||
|
||
This is implemented in two ways. | ||
|
||
First, `polaris list --verbose` now lists the work-directory relative path of | ||
steps, rather than their path relative to the task's work directory: | ||
|
||
``` | ||
$ polaris list --verbose | ||
... | ||
10: path: ocean/spherical/qu/cosine_bell/with_viz | ||
name: cosine_bell | ||
component: ocean | ||
subdir: spherical/qu/cosine_bell/with_viz | ||
steps: | ||
- qu_base_mesh_60km: ocean/spherical/qu/base_mesh/60km | ||
- qu_init_60km: ocean/spherical/qu/cosine_bell/init/60km | ||
- qu_forward_60km: ocean/spherical/qu/cosine_bell/forward/60km | ||
- qu_map_60km: ocean/spherical/qu/cosine_bell/with_viz/map/60km | ||
- qu_viz_60km: ocean/spherical/qu/cosine_bell/with_viz/viz/60km | ||
- qu_base_mesh_90km: ocean/spherical/qu/base_mesh/90km | ||
- qu_init_90km: ocean/spherical/qu/cosine_bell/init/90km | ||
- qu_forward_90km: ocean/spherical/qu/cosine_bell/forward/90km | ||
- qu_map_90km: ocean/spherical/qu/cosine_bell/with_viz/map/90km | ||
- qu_viz_90km: ocean/spherical/qu/cosine_bell/with_viz/viz/90km | ||
- qu_base_mesh_120km: ocean/spherical/qu/base_mesh/120km | ||
- qu_init_120km: ocean/spherical/qu/cosine_bell/init/120km | ||
- qu_forward_120km: ocean/spherical/qu/cosine_bell/forward/120km | ||
- qu_map_120km: ocean/spherical/qu/cosine_bell/with_viz/map/120km | ||
- qu_viz_120km: ocean/spherical/qu/cosine_bell/with_viz/viz/120km | ||
- qu_base_mesh_150km: ocean/spherical/qu/base_mesh/150km | ||
- qu_init_150km: ocean/spherical/qu/cosine_bell/init/150km | ||
- qu_forward_150km: ocean/spherical/qu/cosine_bell/forward/150km | ||
- qu_map_150km: ocean/spherical/qu/cosine_bell/with_viz/map/150km | ||
- qu_viz_150km: ocean/spherical/qu/cosine_bell/with_viz/viz/150km | ||
- qu_base_mesh_180km: ocean/spherical/qu/base_mesh/180km | ||
- qu_init_180km: ocean/spherical/qu/cosine_bell/init/180km | ||
- qu_forward_180km: ocean/spherical/qu/cosine_bell/forward/180km | ||
- qu_map_180km: ocean/spherical/qu/cosine_bell/with_viz/map/180km | ||
- qu_viz_180km: ocean/spherical/qu/cosine_bell/with_viz/viz/180km | ||
- qu_base_mesh_210km: ocean/spherical/qu/base_mesh/210km | ||
- qu_init_210km: ocean/spherical/qu/cosine_bell/init/210km | ||
- qu_forward_210km: ocean/spherical/qu/cosine_bell/forward/210km | ||
- qu_map_210km: ocean/spherical/qu/cosine_bell/with_viz/map/210km | ||
- qu_viz_210km: ocean/spherical/qu/cosine_bell/with_viz/viz/210km | ||
- qu_base_mesh_240km: ocean/spherical/qu/base_mesh/240km | ||
- qu_init_240km: ocean/spherical/qu/cosine_bell/init/240km | ||
- qu_forward_240km: ocean/spherical/qu/cosine_bell/forward/240km | ||
- qu_map_240km: ocean/spherical/qu/cosine_bell/with_viz/map/240km | ||
- qu_viz_240km: ocean/spherical/qu/cosine_bell/with_viz/viz/240km | ||
- analysis: ocean/spherical/qu/cosine_bell/analysis | ||
``` | ||
|
||
Second, we add symlinks within the task to the shared step. In what follows, | ||
the subdirectories in bold are shared steps that reside elsewhere up the | ||
directory tree: each resolution in the `base_mesh`, `init` and `forward`, and | ||
also `analysis`. | ||
|
||
`cosine_bell/with_viz`: | ||
* ocean | ||
* spherical | ||
* qu | ||
* cosine_bell | ||
* with_viz | ||
* base_mesh | ||
* **60km** | ||
* **90km** | ||
* **120km** | ||
* **150km** | ||
* **180km** | ||
* **210km** | ||
* **240km** | ||
* init | ||
* **60km** | ||
* **90km** | ||
* **120km** | ||
* **150km** | ||
* **180km** | ||
* **210km** | ||
* **240km** | ||
* forward | ||
* **60km** | ||
* **90km** | ||
* **120km** | ||
* **150km** | ||
* **180km** | ||
* **210km** | ||
* **240km** | ||
* map | ||
* 60km | ||
* 90km | ||
* 120km | ||
* 150km | ||
* 180km | ||
* 210km | ||
* 240km | ||
* viz | ||
* 60km | ||
* 90km | ||
* 120km | ||
* 150km | ||
* 180km | ||
* 210km | ||
* 240km | ||
* **analysis** | ||
|
||
Thus, a structure similar to what we had before shared steps is maintained | ||
locally, which should make debugging easier. | ||
|
||
### Implementation: The output of shared steps may be used by multiple tasks. | ||
|
||
Task steps that use the output of shared steps will make use of symbolic | ||
links as before. | ||
|
||
### Implementation: tasks do not rely on outputs from steps in other tasks | ||
|
||
There were not any polaris tasks that relied on outputs from other tasks even | ||
before the implementation of shared steps. There are tasks in Compass, though, | ||
such as global ocean `mesh`, `init` and `dynamic_adjustment`, that do allow | ||
outputs from one task to be inputs of another. As these are ported to Polaris, | ||
we will make sure they use shared steps instead. | ||
|
||
## Testing | ||
|
||
### Testing And Validation: Shared steps are run once. | ||
|
||
Output from running a series of tasks or a suite indicates when shared steps | ||
are skipped because they already ran (`already completed`): | ||
|
||
``` | ||
ocean/spherical/icos/cosine_bell | ||
* step: icos_base_mesh_60km | ||
execution: SUCCESS | ||
runtime: 0:01:00 | ||
* step: icos_init_60km | ||
execution: SUCCESS | ||
runtime: 0:00:00 | ||
* step: icos_forward_60km | ||
execution: SUCCESS | ||
runtime: 0:00:38 | ||
... | ||
* step: analysis | ||
execution: SUCCESS | ||
runtime: 0:00:02 | ||
task execution: SUCCESS | ||
task runtime: 0:02:59 | ||
ocean/spherical/icos/cosine_bell/with_viz | ||
* step: icos_base_mesh_60km | ||
already completed | ||
* step: icos_init_60km | ||
already completed | ||
* step: icos_forward_60km | ||
already completed | ||
* step: icos_map_60km | ||
execution: SUCCESS | ||
runtime: 0:00:20 | ||
* step: icos_viz_60km | ||
execution: SUCCESS | ||
runtime: 0:00:06 | ||
... | ||
* step: analysis | ||
already completed | ||
task execution: SUCCESS | ||
task runtime: 0:03:23 | ||
``` | ||
|
||
### Testing And Validation: Shared steps are run before steps that depend on their output. | ||
|
||
As before, steps are added to tasks in the order they are to be run, ensuring | ||
that shared steps run before steps that require their output when running in | ||
task serial (`polaris serial`). Task parallelism already has mechanisms to | ||
prevent steps from running before their dependencies are available, and this | ||
is not expected to be affected by shared steps. However, no testing with | ||
task parallelism will be performed at this time. | ||
|
||
### Testing And Validation: Shared steps are not daughters of a task | ||
|
||
Steps run successfully even after we have removed the `task` attribute from | ||
them, indicating that they no longer rely on information about a task they | ||
formerly belonged to. | ||
|
||
### Testing And Validation: Working directory structure is intuitive. | ||
|
||
The intuitive work structure will need to be maintained by developers as new | ||
tasks and steps are added, as this is not enforced by the framework. The | ||
proposed implementation ensures that shared steps either reside close to the | ||
root of the directory structure from the tasks that use them or that they | ||
live inside of the tasks, which we have deemed an intuitive structure. | ||
|
||
### Testing And Validation: Working directory step paths are easily discoverable by users. | ||
|
||
Between `polaris list --verbose` and the local symlinks to shared steps within | ||
each task, we think the shared steps will be discoverable by users and | ||
developers. | ||
|
||
### Testing And Validation: The output of shared steps may be used by multiple tasks. | ||
|
||
We have implemented shared steps for base meshes, initial conditions and | ||
forward runs, and shown that multiple tasks can make use of their output. | ||
|
||
### Testing And Validation: tasks do not rely on outputs from steps in other tasks | ||
|
||
This is not enforced, it will simply need to be maintained as the preferred | ||
convention for future development. Currently, all tasks can be run | ||
independently and do not rely on any other tasks. |