Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trigger now if workflow paused #6499

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

hjoliver
Copy link
Member

@hjoliver hjoliver commented Nov 28, 2024

Close #5963
Supesede #6192 (it was getting messy)
Close #6398


New behaviour for triggering when the workflow is paused

DEFAULT: run triggered tasks NOW even if the workflow is paused

  • supports the common use case "I want to pause the workflow and just trigger individual tasks"
  • (NIWA priority raised by both ops and research users in early Cylc 8 migration meetings)

OPTION: --on-resume - run triggered tasks only when the paused workflow is resumed

  • this supports the broadcast-to-past-tasks workaround until broadcast is fixed (prob. 8.5.0)
  • the option can be removed in the future, once broadcast is fixed, if there are no other use cases

Implementation:

  • separate start of job submission from queue release, in the scheduler
  • manual trigger adds tasks to tasks_to_trigger_(now|on_resume) lists, after state reset and queue wrangling
  • if these lists are populated, the scheduler will release tasks from them as appropriate
  • (queues are not processed if the scheduler is paused)

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • Tests are included (or explain why tests are not needed).
  • Changelog entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at Update docs for trigger-when-paused. cylc-doc#778
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@hjoliver hjoliver added this to the 8.4.0 milestone Nov 28, 2024
@hjoliver hjoliver self-assigned this Nov 28, 2024
@hjoliver hjoliver force-pushed the trigger-when-paused_2 branch from 98d500e to 3dc032c Compare November 28, 2024 02:05
@hjoliver
Copy link
Member Author

hjoliver commented Nov 28, 2024

Status of this PR:

  • I can easily remove the --on-resume option, if we have a viable alternative for the broadcast problem at 8.4.0
  • if we keep the option, I just need to extend the tests to cover it

@hjoliver hjoliver changed the title Trigger --now for workflow pause and queues. Trigger now if workflow paused Nov 28, 2024
@hjoliver hjoliver force-pushed the trigger-when-paused_2 branch 3 times, most recently from 55c71a9 to 0372d91 Compare November 29, 2024 01:07
@MetRonnie MetRonnie self-requested a review November 29, 2024 17:35
@dwsutherland
Copy link
Member

dwsutherland commented Dec 2, 2024

(queues are not processed if the scheduler is paused)

Does this mean if I trigger when paused, and a queue limit is reached, that it will enter a queue but not be released when the queue "empties" of running tasks?

Will test.

@hjoliver
Copy link
Member Author

hjoliver commented Dec 2, 2024

(queues are not processed if the scheduler is paused)

Does this mean if I trigger when pause, and a queue limit is reached, that it will enter a queue but not be released when the queue "empties" of running tasks?

Yes. Pausing the scheduler halts all automatic job submission, so that means queued tasks won't automatically run. But if you want the queued task to run before resuming the workflow you can just trigger it again before resuming.

@hjoliver hjoliver force-pushed the trigger-when-paused_2 branch from 645707c to 02f1a22 Compare December 2, 2024 02:34
@hjoliver
Copy link
Member Author

hjoliver commented Dec 2, 2024

(I've just tweaked the pause and trigger command help in an attempt to make that clearer @dwsutherland )

@hjoliver hjoliver force-pushed the trigger-when-paused_2 branch from 02f1a22 to cb03ae0 Compare December 2, 2024 02:36
Copy link
Member

@dwsutherland dwsutherland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, reacts as expect with queued/unqueued 👍

    [[queues]]
        [[[THIS]]]
            limit = 1
            members = baz, zip, toz
    [[graph]]
        R1 = "prep => foo"
        PT12H = "foo[-PT12H] => foo => baz & zip & toz => bar"

image

cylc/flow/network/schema.py Outdated Show resolved Hide resolved
cylc/flow/network/schema.py Outdated Show resolved Hide resolved
cylc/flow/network/schema.py Outdated Show resolved Hide resolved
changes.d/6499.feat.md Outdated Show resolved Hide resolved
cylc/flow/scripts/pause.py Outdated Show resolved Hide resolved
cylc/flow/scripts/pause.py Outdated Show resolved Hide resolved
cylc/flow/scripts/trigger.py Outdated Show resolved Hide resolved
cylc/flow/scripts/trigger.py Outdated Show resolved Hide resolved
cylc/flow/scripts/trigger.py Outdated Show resolved Hide resolved
cylc/flow/task_queues/independent.py Outdated Show resolved Hide resolved
cylc/flow/task_queues/independent.py Outdated Show resolved Hide resolved
@oliver-sanders
Copy link
Member

@hjoliver, by bypassing queue processing for manual trigger, this PR may close #6398

cylc/flow/scripts/pause.py Outdated Show resolved Hide resolved
cylc/flow/scripts/trigger.py Outdated Show resolved Hide resolved
tests/integration/conftest.py Show resolved Hide resolved
@MetRonnie MetRonnie added the schema change Change to the Cylc GraphQL schema label Dec 3, 2024
Comment on lines 167 to +173

# set --pre=all should not cause a task to submit in a paused workflow
assert len(submitted_tasks) == 0
# but triggering it should
schd.pool.force_trigger_tasks(['1/a'], [])
schd.release_tasks_to_run()
assert len(submitted_tasks) == 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please could you put this new bit in a separate test and revert the splitting of trigger and set from this test; a key feature of this test was that is checks equity between trigger and set for the same flow nums

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I split that into separate tests is that this PR changes that equivalence.

Trigger and set (in a paused workflow) are more similar on master, because a triggered task does not become "active" until the workflow is resumed, so it can be triggered multiple times to absorb multiple flow numbers. On this branch, a triggered task becomes active immediately and so subsequent triggers get ignored.

In light of that, do you still want me to revert that? I suppose we could have a combined test for what is common, and separate tests for the behavior that differs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In light of that, do you still want me to revert that? I suppose we could have a combined test for what is common, and separate tests for the behavior that differs.

👆 That

@hjoliver
Copy link
Member Author

hjoliver commented Dec 3, 2024

@oliver-sanders -

@hjoliver, by bypassing queue processing for manual trigger, this PR may close #6398

That issue is confusing. I think the "bug" initially reported by Tom is just the expected double-trigger requirement to get past a queue. In his description, despite the tasks being held, they did run on the second trigger attempt - but he had expected that manual trigger would skip queues to run immediately after just one attempt.

So I presume you're referring to the presumably-actual-bug that you commented on part way through the discussion: #6398 (comment)

... I've commented there, as I'm not sure this PR is relevant: #6398 (comment)

@oliver-sanders
Copy link
Member

oliver-sanders commented Dec 4, 2024

I think the "bug" initially reported by Tom is just the expected double-trigger requirement to get past a queue.

No, there is definitely a bug lurking here.

I have added an example to the issue in which task doesn't run when triggered (no queue limiting in effect), however, on this branch the task does run (entering an infinite kill/trigger loop).

So it looks like this closes #6398 (congrats on fixing a bug you didn't think existed!).

@MetRonnie
Copy link
Member

Test added on master failing - need to merge master into this branch

@hjoliver
Copy link
Member Author

hjoliver commented Dec 5, 2024

So it looks like this closes #6398 (congrats on fixing a bug you didn't think existed!).

Great, well done me. I'll add a close directive to the description above...

@oliver-sanders
Copy link
Member

oliver-sanders commented Dec 5, 2024

Great, well done me. I'll add a close directive to the description above...

(will see if I can get a test for this, ..., hjoliver#61)

One recently added test is failing:

FAILED tests/integration/run_modes/test_skip.py::test_doesnt_release_held_tasks

cylc/flow/commands.py Show resolved Hide resolved
cylc/flow/scripts/trigger.py Outdated Show resolved Hide resolved
cylc/flow/task_pool.py Outdated Show resolved Hide resolved
Copy link
Member

@MetRonnie MetRonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider this approved pending addressing of comments (including #6499 (comment))

@oliver-sanders
Copy link
Member

(have manually run the auto-restart tests on this branch, they all pass)

Comment on lines +1450 to +1493
if self.pool.tasks_to_trigger_on_resume:
# and manually triggered tasks to run once workflow resumed
pre_prep_tasks.update(self.pool.tasks_to_trigger_on_resume)
self.pool.tasks_to_trigger_on_resume = set()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think codecov is right, these lines are untested.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I was waiting for agreement to add the option before adding tests for it, then failed to come back to the PR till now.

@hjoliver hjoliver force-pushed the trigger-when-paused_2 branch from d00915e to bdbcaa0 Compare December 10, 2024 21:48
Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved pending resolution of Ronnie's comment above and, ideally, some kind of test for the --on-resume option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
schema change Change to the Cylc GraphQL schema
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Held tasks need multiple triggers to run? Allow trigger-now in a paused workflow?
4 participants