Task parameter section upgrade.

cylc · Nov 17, 2021 · f922e05 · f922e05
1 parent 6c29d7b
commit f922e05
Showing 1 changed file with 110 additions and 76 deletions.
diff --git a/src/user-guide/writing-workflows/parameterized-tasks.rst b/src/user-guide/writing-workflows/parameterized-tasks.rst
@@ -7,19 +7,22 @@ Cylc can automatically generate tasks and dependencies by expanding
 :term:`parameterized <parameterisation>` task names over lists of parameter
 values. Uses for this include:
 
-- generating an ensemble of similar model runs
-- generating chains of tasks to process similar datasets
-- replicating an entire workflow, or part thereof, over several runs
-- splitting a long model run into smaller steps or ``chunks``
-  (parameterized cycling)
+- Generating an ensemble of similar model runs
+- Generating chains of tasks to process similar datasets
+- Replicating an entire workflow, or part thereof, over several runs
+- Splitting a long model run into smaller chunks
+- Parameterized cycling
 
 .. note::
 
-   This can be done with Jinja2 loops too (:ref:`User Guide Jinja2`)
-   but parameterization is much cleaner (nested loops can seriously reduce
-   the clarity of a workflow configuration).*
-
+   Cylc supports use of :ref:`Jinja2 <User Guide Jinja2>` and :ref:`Empy
+   <User Guide Empy>` templating for programmatic generation of workflow
+   configurations. The built-in parameterization system described here
+   is a cleaner and easier alternative *for generating tasks and families
+   over a range of parameters*, but unlike general templating it can only be
+   used for that specific purpose.
 
+
 Parameter Expansion
 -------------------
 
@@ -225,8 +228,8 @@ should be overridden to remove the initial underscore. For example:
            """
 
 
-Passing Parameter Values To Tasks
----------------------------------
+Passing Values To Tasks
+-----------------------
 
 Parameter values are passed as environment variables to tasks generated by
 parameter expansion. For example, if we have:
@@ -273,8 +276,8 @@ environment variables:
    export MYFILE=/path/to/run002/ship
 
 
-Selecting Specific Parameter Values
------------------------------------
+Selecting Specific Values
+-------------------------
 
 Specific parameter values can be singled out in the graph and under
 :cylc:conf:`[runtime]` with the notation ``<p=5>`` (for example).
@@ -299,8 +302,8 @@ set of model runs:
        #...
 
 
-Selecting Partial Parameter Ranges
-----------------------------------
+Selecting Partial Ranges
+------------------------
 
 The parameter notation does not currently support partial range selection such
 as ``foo<p=5..10>``, but you can achieve the same result by defining a
@@ -325,8 +328,8 @@ template as the full-range parameter. For example:
        #...
 
 
-Parameter Offsets In The Graph
-------------------------------
+Offsets in the Graph
+---------------------
 
 A negative offset notation ``<NAME-1>`` is interpreted as the previous
 value in the ordered list of parameter values, while a positive offset is
@@ -367,8 +370,8 @@ expands to:
    proc_small => proc_big => proc_huge
 
 
-Task Families And Parameterization
-----------------------------------
+Task Families and Parameters
+----------------------------
 
 Task family members can be generated by parameter expansion:
 
@@ -459,15 +462,11 @@ expands to:
 Parameterized Cycling
 ---------------------
 
-For most purposes use of
-a proper :term:`cycling` workflow is recommended, wherein Cylc incrementally
-generates the datetime sequence and extends the workflow, potentially
-indefinitely, at run time. For smaller systems of finite duration, however,
-parameter expansion can be used to generate a sequence of pre-defined tasks
-as a proxy for cycling.
+For smaller workflows of finite duration, parameter expansion can be used to
+generate a sequence of pre-defined tasks as a proxy for cycling.
 
-Here's a cycling workflow of two-monthly model runs for one year,
-with previous-instance model dependence (e.g. for model restart files):
+Here's a cycling workflow of two-monthly model runs for one year, with
+previous-instance model dependence:
 
 .. code-block:: cylc
 
@@ -483,12 +482,13 @@ with previous-instance model dependence (e.g. for model restart files):
        [[model]]
            script = "run-model $CYLC_TASK_CYCLE_POINT"
 
-And here's how to do the same thing with parameterized tasks:
+
+And here's how to do the same thing with parameterized tasks instead of cycling:
 
 .. code-block:: cylc
 
    [task parameters]
-           chunk = 1..6
+       chunk = 1..6
    [scheduling]
        [[graph]]
            R1 = """
@@ -499,19 +499,17 @@ And here's how to do the same thing with parameterized tasks:
    [runtime]
        [[model<chunk>]]
            script = """
-   # Compute start date from chunk index and interval, then run the model.
-   INITIAL_POINT=2020-01
-   INTERVAL_MONTHS=2
-   OFFSET_MONTHS=(( (CYLC_TASK_PARAM_chunk - 1)*INTERVAL_MONTHS ))
-   OFFSET=P${OFFSET_MONTHS}M  # e.g. P4M for chunk=3
-   run-model $(cylc cyclepoint --offset=$OFFSET $INITIAL_POINT)"""
-
-The two workflows are shown together below. They both achieve the same
-result, and both can include special tasks at the start, end, or
-anywhere in between. But as noted earlier the parameterized version has
-several disadvantages: it must be finite in extent and not too large; the
-datetime arithmetic has to be done by the user; and the full extent of the
-workflow will be visible at all times as the workflow runs.
+               # Compute start date from chunk index and interval.
+               INITIAL_POINT=2020-01
+               INTERVAL_MONTHS=2
+               OFFSET_MONTHS=(( (CYLC_TASK_PARAM_chunk - 1)*INTERVAL_MONTHS ))
+               OFFSET=P${OFFSET_MONTHS}M  # e.g. P4M for chunk=3
+               # Run the model.
+               run-model $(cylc cyclepoint --offset=$OFFSET $INITIAL_POINT)
+           """
+
+The two workflows achieve the same result, and both can include special
+behaviour at the start, end, or anywhere in between.
 
 .. todo
    Create sub-figures if possible: for now hacked as separate figures with
@@ -527,15 +525,36 @@ workflow will be visible at all times as the workflow runs.
 
    Parameterized (top) and cycling (bottom) versions of the same
    workflow. The first three cycle points are shown in the
-   cycling case. The parameterized case does not have "cycle points".
+   cycling case. The parameterized case does not have cycle points (technically
+   all of its tasks have the cycle point 1).
+
+The parameterized version has several disadvantages, however:
+
+  - The workflow must be finite in extent and not too large because every
+    parameterized task generates a new task definition
 
-Here's a yearly-cycling workflow with four parameterized chunks in each cycle
-point:
+    - (In a cycling workflow a single task definition acts as a template for
+      all cycle point instances of a task)
+  - Datetime arithmetic has to be done manually
+
+    - (This doesn't apply if it's not a datetime sequence; parameterized
+      integer cycling is straightforward.)
+
+
+Parameterized Sub-Cycles
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+A workflow can have multiple main cycling sequences, but sub-cycles within each
+main cycle point have to be parameterized. A typical use case for this is
+incremental processing of files generated sequentially during a long model run.
+
+Here's a workflow that uses parameters to split a long model run in each
+datetime cycle point into four smaller runs:
 
 .. code-block:: cylc
 
    [task parameters]
-           chunk = 1..4
+       chunk = 1..4
    [scheduling]
        initial cycle point = 2020-01
        [[graph]]
@@ -544,31 +563,46 @@ point:
                model<chunk=4>[-P1Y] => model<chunk=1>
            """
 
-.. note::
+The inter-cycle trigger connects the first chunk in each cycle point to the
+last chunk in the previous cycle point. However, in this particular case it
+might be simpler to use a 3-monthly datetime cycle instead:
+
+.. code-block:: cylc
 
-   The inter-cycle trigger connects the first chunk in each cycle point
-   to the last chunk in the previous cycle point. Of course it would be simpler
-   to just use 3-monthly cycling:
+   [scheduling]
+       initial cycle point = 2020-01
+       [[graph]]
+           P3M = "model[-P3M] => model"
 
-   .. code-block:: cylc
 
-      [scheduling]
-          initial cycle point = 2020-01
-          [[graph]]
-              P3M = "model[-P3M] => model"
+For another example, here task ``model`` generates 10 files in sequence at it
+runs. Task ``proc_file0`` triggers when the model starts running, to wait for
+and process the first file; when that is done, ``proc_file1`` triggers to wait
+for the second file; and so on.
 
-Here's a possible valid use-case for mixed cycling: consider a portable
-datetime cycling workflow of model jobs that can each take too long to run on
-some supported platforms. This could be handled without changing the cycling
-structure of the workflow by splitting the run (at each cycle point) into a
-variable number of shorter steps, using more steps on less powerful hosts.
+.. code-block:: cylc
+
+   [task parameters]
+       file = 0..9
+   [scheduling]
+       initial cycle point = 2020-01
+       [[graph]]
+           P1Y = """
+               model:start => proc<file=0>
+               proc<file-1> => proc<file>
+               proc<file=9> => upload_products
+           """
+   [runtime]
+      [[model]]
+      [[proc<file>]]
+      [[upload_products]]
 
 
-Cycle Point And Parameter Offsets At Start-Up
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Offsets at Sequence Start
+^^^^^^^^^^^^^^^^^^^^^^^^^
 
-In cycling workflows cylc ignores anything earlier than the workflow initial
-cycle point. So this graph:
+In cycling workflows dependence on tasks prior to the start cycle point is
+ignored [1]_. So this graph:
 
 .. code-block:: cylc
 
@@ -580,26 +614,26 @@ simplifies at the initial cycle point to this:
 
    P1D = "model"
 
-Similarly, parameter offsets are ignored if they extend beyond the start
-of the parameter value list. So this graph:
+(Note this is a convenient way to bootstrap into an infinite cycle, but special
+behaviour at the start point can be configured explicitly if desired).
+
+Similarly, parameter offsets that go out of range are ignored. So this graph:
 
 .. code-block:: cylc
 
+   # for chunk = 1..4
    R1 = "model<chunk-1> => model<chunk>"
 
 simplifies for ``chunk=1`` to this:
 
 .. code-block:: cylc
 
-   R1 = "model_chunk1"
+   R1 = "model_chunk0"
 
-.. note::
 
-   The initial cut-off applies to every parameter list, but only
-   to cycle point sequences that start at the workflow initial cycle point.
-   Therefore it may be somewhat easier to use parameterized cycling if you
-   need multiple datetime sequences *with different start points* in the
-   same workflow. We plan to allow this sequence-start simplification for any
-   datetime sequence in the future, not just at the workflow initial point,
-   but it needs to be optional because delayed-start cycling tasks
-   sometimes need to trigger off earlier cycling tasks.
+.. [1] Currently this only applies to the unique workflow start cycle point, so
+       it may be easier to use parameterized cycling if you have multiple
+       (finite) sequences starting at different points. We plan to extend this
+       convenience to all sequences regardless of start point, but use will be
+       optional because delayed-start cycling tasks may need to trigger off of
+       earlier cycles.