Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Platformization of examples #370

Merged
merged 9 commits into from
Jan 5, 2022
13 changes: 12 additions & 1 deletion src/reference/config/writing-platform-configs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,17 @@ Writing Platform Configurations
- :cylc:conf:`global.cylc[platforms]`
- :cylc:conf:`global.cylc[platforms][<platform name>]install target`

If you are working on an institutional network platforms may already
have been configured for you.

.. TODO update the command below after implementing a platform
listing command.

To see available platforms::

cylc config -i [platforms]
cylc config -i [platform groups]

What Are Platforms?
-------------------

Expand Down Expand Up @@ -75,7 +86,7 @@ Simple Remote Platform

Users want to run background jobs on a single server,
which doesn't share a file system with the workflow host.

.. code-block:: cylc
:caption: part of a ``global.cylc`` config file

Expand Down
4 changes: 2 additions & 2 deletions src/tutorial/furthertopics/message-triggers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,8 @@ triggers another task bar and when fully completed triggers another task, baz.
.. code-block:: cylc

[scheduling]
[[dependencies]]
graph = """
[[graph]]
R1 = """
foo:out1 => bar
foo => baz
"""
Expand Down
14 changes: 6 additions & 8 deletions src/tutorial/runtime/configuration-consolidation/parameters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,8 @@ Cylc Parameters

.. code-block:: cylc

[scheduler]
[[parameters]]
world = Mercury, Venus, Earth
[task parameters]
world = Mercury, Venus, Earth


.. ifnotslides::
Expand Down Expand Up @@ -97,11 +96,10 @@ Parameters can be either words or integers:

.. code-block:: cylc

[scheduler]
[[parameters]]
foo = 1..5
bar = 1..5..2
baz = pub, qux, bol
[task parameters]
wxtim marked this conversation as resolved.
Show resolved Hide resolved
foo = 1..5
bar = 1..5..2
baz = pub, qux, bol

.. nextslide::

Expand Down
31 changes: 16 additions & 15 deletions src/tutorial/runtime/runtime-configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -159,10 +159,12 @@ Sometimes jobs fail. This can be caused by two factors:

.. ifnotslides::

In the event of failure Cylc can automatically re-submit (retry) jobs. We
configure retries using the ``execution retry delays`` and
``submission retry delays`` settings. These settings are both set to an
:term:`ISO8601 duration`, e.g. setting ``execution retry delays`` to ``PT10M``
In the event of failure Cylc can automatically re-submit or retry jobs.

Configure retries using the ``submission retry delays`` and
``execution retry delays`` settings.
These settings are lists of :term:`ISO8601 durations <ISO8601 duration>`,
for example; setting ``execution retry delays`` to ``PT10M``
would cause the job to retry every 10 minutes in the event of execution
failure.

Expand All @@ -172,17 +174,16 @@ Sometimes jobs fail. This can be caused by two factors:
.. code-block:: cylc

[runtime]
[[some-task]]
script = some-script

# In the event of execution failure, retry a maximum
# of three times every 15 minutes.
execution retry delays = 3*PT15M

# In the event of submission failure, retry a maximum
# of two times every ten minutes and then every 30
# minutes thereafter.
submission retry delays = 2*PT10M, PT30M
[[some-task]]
script = some-script

# In the event of execution failure, retry a maximum
# of three times every 15 minutes.
execution retry delays = 3*PT15M
# In the envent of a submission failure, retry a maximum
# of two times every ten minutes and then every 30
# minutes thereafter.
submission retry delays = 2*PT10M, PT30M


Start, Stop, Restart
Expand Down
37 changes: 26 additions & 11 deletions src/user-guide/running-workflows/tracking-task-state.rst
Original file line number Diff line number Diff line change
Expand Up @@ -123,33 +123,48 @@ that do not allow TCP or non-interactive SSH from job host to workflow host.
Be careful to avoid spamming task hosts with polling operations. Each poll
opens (and then closes) a new SSH connection.

Polling intervals are configurable under :cylc:conf:`[runtime]` because they
may depend on expected job execution time. You may want to poll a job
frequently at first, to check that it started running properly; frequently
near the expected end of its run time, to get a timely task finished update;
and infrequently between times. Configured intervals are used in sequence until
the last value, which is used repeatedly until the job is finished:
Polling intervals are configured by
:cylc:conf:
`global.cylc[platforms][<platform name>]submission polling intervals`
and
:cylc:conf:
`global.cylc[platforms][<platform name>]execution polling intervals`.

A common use case is to poll:

.. TODO - platformise this example
- frequently at first, to check that a job has started running properly;
- frequently near the expected end of its run time, to get a timely task finished update;
- infrequently in the intervening period.

Configured intervals are used in sequence until
the last value, which is used repeatedly until the job is finished:

.. code-block:: cylc
:caption: global.cylc

[runtime]
[[foo]]
[platforms]
[[my_platform]]
# poll every minute in the 'submitted' state:
submission polling intervals = PT1M

# poll one minute after foo starts running, then every 10
# minutes for 50 minutes, then every minute until finished:
execution polling intervals = PT1M, 5*PT10M, PT1M

.. cylc-scope:: global.cylc[platforms][<platform name>]
.. code-block:: cylc
:caption: flow.cylc

[runtime]
[[task]]
platform = my_platform


A list of intervals with optional multipliers can be used for both submission
and execution polling, although a single value is probably sufficient for
submission. If these items are not configured, default values from
site and user global config will be used for
:cylc:conf:`communication method = polling`.
:cylc:conf:
`global.cylc[platforms][<platform name>]communication method = poll`.

Polling is not done by default under the other task communications methods, but
it can be configured as well if you like.
Expand Down
14 changes: 7 additions & 7 deletions src/user-guide/writing-workflows/parameterized-tasks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ values. Uses for this include:
over a range of parameters*, but unlike general templating it can only be
used for that specific purpose.


Parameter Expansion
-------------------

Expand All @@ -34,7 +34,7 @@ For example:

.. code-block:: cylc

[[task parameters]]
[task parameters]
# parameters: "ship", "buoy", "plane"
# default task suffixes: _ship, _buoy, _plane
obs = ship, buoy, plane
Expand Down Expand Up @@ -197,7 +197,7 @@ To get thicker padding and/or alternate suffixes, use a template. E.g.:

.. code-block:: cylc

[[task parameters]]
[task parameters]
i = 1..9
p = 3..14
[[templates]]
Expand All @@ -215,9 +215,9 @@ should be overridden to remove the initial underscore. For example:
.. code-block:: cylc

[task parameters]
i = 1..4
obs = ship, buoy, plane
[[parameter templates]]
i = 1..4
obs = ship, buoy, plane
[[templates]]
i = i%(i)d # task name must begin with an alphabet
obs = %(obs)s
[scheduling]
Expand Down Expand Up @@ -539,7 +539,7 @@ The parameterized version has several disadvantages, however:

- (This doesn't apply if it's not a datetime sequence; parameterized
integer cycling is straightforward.)


Parameterized Sub-Cycles
^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
59 changes: 52 additions & 7 deletions src/user-guide/writing-workflows/runtime.rst
Original file line number Diff line number Diff line change
Expand Up @@ -420,17 +420,62 @@ For this to work:
Platforms, like other runtime settings, can be declared globally in the root
family, or in other families, or for individual tasks.

.. note::

Dynamic Platform Selection
^^^^^^^^^^^^^^^^^^^^^^^^^^
The platform known as ``localhost`` is the platform where the scheduler
is running, in many cases a dedicated server and *not* your desktop.

Internal Platform and Host Selection
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The :cylc:conf:`[runtime][<namespace>]platform` item points to either a
:cylc:conf:`platform <global.cylc[platforms][<platform name>]>` or a
:cylc:conf:`platform group <global.cylc[platform groups][<group>]>`.

.. TODO - consider a re-write once dynamic platform selection done
:term:`Cylc platforms <platform>` allow you to configure compute platforms
you wish Cylc to run jobs on.

:term:`Platform groups <platform group>` allow you to group together platforms
any of which would be suitable for a given job.
Platform groups can improve robustness by allowing jobs to be submitted on
any platform in the group, as well as providing an interface for
:cylc:conf:`basic load balancing
<global.cylc[platform groups][<group>][selection]method>`.

:term:`Platforms <platform>` are selected from a :term:`platform group` once,
when a job is submitted.

Hosts within a :term:`platform` are re-selected each time the scheduler
needs to communicate with a job.

.. seealso::

:ref:`AdminGuide.PlatformConfigs`: For details of how Platforms and
Platform Groups are set up and in-depth examples.

External Platform Selection Scripts
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. deprecated:: 8.0.0

Cylc 8 can select hosts from a group of suitable hosts listed in the
platform config, so in many cases this logic should no longer be necessary.

Instead of hardwiring platform names into the workflow configuration you can
give a command that prints a platform name, or an environment variable, as the
value of :cylc:conf:`[runtime][<namespace>]platform`.

Job hosts are always selected dynamically, for the chosen platform.
For example:

.. code-block:: cylc
:caption: flow.cylc

[runtime]
[[mytask]]
platform = $(script-which-returns-a-platform-name)

Job hosts are always selected dynamically, for the chosen platform or
platform group.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding a paragraph explaining what happens if the command exits with a non-zero return code or bump to a new issue.

Copy link
Member Author

@wxtim wxtim Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've bumped this to #374 because:

  • I've not been well enough to deal today with it.
  • This PR may be at risk horrid conflicts with Mel's work.


Remote Task Job Log Directories
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -649,8 +694,8 @@ task-specific) ways to configure event handlers:
script = test ${CYLC_TASK_TRY_NUMBER} -eq 2
execution retry delays = PT0S, PT30S
[[[events]]] # event-specific handlers:
retry handler = notify-retry.py
failed handler = notify-failed.py
retry handlers = notify-retry.py
failed handlers = notify-failed.py

.. code-block:: cylc

Expand Down Expand Up @@ -738,7 +783,7 @@ triggers at 30 minutes after cycle point, a late event could be configured like
script = run-model.sh
[[[events]]]
late offset = PT40M # allow a 10 minute delay
late handler = my-handler %(message)s
late handlers = my-handler %(message)s

.. warning::
Late offset intervals are not computed automatically so be careful to update
Expand Down
19 changes: 4 additions & 15 deletions src/workflow-design-guide/general-principles.rst
Original file line number Diff line number Diff line change
Expand Up @@ -654,21 +654,10 @@ Job Submission Retries
^^^^^^^^^^^^^^^^^^^^^^

When submitting jobs to a remote host, use job submission retries to
automatically resubmit tasks in the event of network outages. Note this is
distinct from job retries for job execution failure (just below).

Job submission retries should normally be host (or host-group for
``rose host-select``) specific, not task-specific, so configure them in
a host (or host-group) specific family. The following :cylc:conf:`flow.cylc`
fragment configures all HPC jobs to retry on job submission failure up to 10
times at 1 minute intervals, then another 5 times at 1 hour intervals:

.. code-block:: cylc

[runtime]
[[HPC]] # Inherited by all jobs submitted to HPC.
submission retry delays = 10*PT1M, 5*PT1H
automatically resubmit tasks in the event of network outages.

Note that this is distinct from job retries for
job execution failure (just below).

Job Execution Retries
^^^^^^^^^^^^^^^^^^^^^
Expand All @@ -678,7 +667,7 @@ believe that a simple retry will usually succeed. This may be the case if the
job host is known to be flaky, or if the job only ever fails for one known
reason that can be fixed on a retry. For example, if a model fails occasionally
with a numerical instability that can be remedied with a short timestep rerun,
then an automatic retry may be appropriate:
then an automatic retry may be appropriate.

.. code-block:: cylc

Expand Down