Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validation: make environment property mandatory in serial steps #415

Merged

Conversation

giuseppe-steduto
Copy link
Member

@codecov
Copy link

codecov bot commented Oct 26, 2023

Codecov Report

Merging #415 (120559a) into master (2ad95ce) will increase coverage by 0.16%.
The diff coverage is 88.88%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #415      +/-   ##
==========================================
+ Coverage   37.50%   37.66%   +0.16%     
==========================================
  Files          26       26              
  Lines        1557     1561       +4     
==========================================
+ Hits          584      588       +4     
  Misses        973      973              
Files Coverage Δ
reana_commons/validation/utils.py 79.31% <100.00%> (+1.53%) ⬆️
reana_commons/version.py 0.00% <0.00%> (ø)

Copy link
Member

@mdonadoni mdonadoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to break some non-serial workflows, such as the Snakemake version of helloworld. Can you reproduce this?

$ reana-client run -f reana-snakemake.yaml
==> Creating a workflow...
Job stats:
job           count    min threads    max threads
----------  -------  -------------  -------------
all               1              1              1
helloworld        1              1              1
total             2              1              1

[WARNING] Job stats:
job           count    min threads    max threads
----------  -------  -------------  -------------
all               1              1              1
helloworld        1              1              1
total             2              1              1

==> Verifying REANA specification file... .../reana-demo-helloworld/reana-snakemake.yaml
[ERROR] Invalid REANA specification: None is not of type 'string'
==> ERROR: Cannot create workflow : 
None is not of type 'string'

Failed validating 'type' in schema['properties']['workflow']['properties']['specification']['properties']['steps']['items']['properties']['kubernetes_memory_limit']:
    {'id': '/properties/workflow/properties/specification/properties/steps/properties/kubernetes_memory_limit',
     'title': 'Memory limit for the step container (e.g. 256Mi).',
     'type': 'string'}

On instance['workflow']['specification']['steps'][0]['kubernetes_memory_limit']:
    None

Also note that the CWL version has a very different workflow.specification (even though it still works fine):

"workflow": {
  "file": "workflow/cwl/helloworld.cwl",
  "specification": {
    "$graph": [
      {
        "class": "Workflow",
        "id": "#main",
        "inputs": [
          {
            "id": "#main/helloworld",
            "type": "File"
...

Maybe it's possible to check the schema of workflow.specification only if workflow.type is serial?

@giuseppe-steduto
Copy link
Member Author

giuseppe-steduto commented Oct 26, 2023

This seems to break some non-serial workflows, such as the Snakemake version of helloworld

Apparently this happens because the Snakemake specification is loaded using None as a value for all the keys that were not explicitely specified in the Snakefile. Thus a solution could be to allow for both strings and null values for this and other properties.

Maybe it's possible to check the schema of workflow.specification only if workflow.type is serial?

AFAIK in JSON Schema draft-04 (the one we are using), one can't express conditional requirements directly. JSON Schema draft-07 introduced the if, then, and else structure, so it would be possible to do something like this:

{
  ...
  "workflow": {
    ...
    "properties": {
      "type": {
        "type": "string",
        "enum": [
          "cwl",
          "serial",
          "yadage",
          "snakemake"
        ]
      }
    },
    "if": {
      "properties": {
        "type": {
          "const": "serial"
        }
      }
    },
    "then": {
      "properties": {
        "specification": {
          "properties": {
            "steps": {
              "items": {
                "required": ["environment"]
              }
            }
          }
        }
      }
    },
    "else": {
      "properties": {
        "specification": {
          "id": ...
        }
      }
    }
  }
}

But upgrading to draft-07 might not be worth it, especially if it's only for this reason (draft-06 introduced a few big backwards-incompatible changes).

Another valid option at this point would be to add only the "required": ["environment"]" clause and forget about the rest of the "specification" schema. In this way the effect would be:

  • Serial workflows: make environment mandatory in the steps, as it should be
  • Snakemake workflows: make environment mandatory in the steps, as it should be (since the steps are loaded in a very similar way to the serial workflows)
  • CWL workflows: the steps are inside specification.$graph.steps, so the required clause does not apply
  • Yadage workflows: the steps are inside specification.stages.scheduler..., so the required clause does not apply

In this way, also, all the specifications covered by reana-dev run-example -w cwl -w snakemake -w serial -w yadage -c DEMO -b kubernetes --submit-only seem to pass the validation stage successfully.

Yet another alternative, mentioned in the JSONSchema documentation, could be to use a small boolean algebra trick to rewrite the A -> B condition as !A || B (therefore not needing conditionals). This has the advantage of allowing to specify more constraints on serial workflows (such as no inputs or outputs under each workflow step). Even though somewhat clever, it would complicate the schema a bit, especially because the conditions is on workflow.type, whereas the inquired properties are under workflow.specifications.steps.items.{property}, and the pattern is not very intuitive.

Finally, the last option I see is enforcing different JSON schemas for different workflow types at the r-commons level (maybe with a general one for every workflow engine, extended by the workflow-engine-specific ones.

To recap:
Solution 1️⃣: allow for string an null values on kubernetes_memory_limit and other properties.
Solution 2️⃣: upgrade to JSON schema draft-07 and use the if/then construct.
Solution 3️⃣: only keep the "required": ["environment"]" clause (I would personally go for this one).
Solution 4️⃣: use !A || B instead of A -> B to define the condition
Solution 5️⃣: enforce different JSON schemas at the REANA logic level for different workflow engines.

What do you think?

@giuseppe-steduto giuseppe-steduto force-pushed the validate-fix-environment-misplaced branch from 5750fce to 3d5b5a6 Compare November 8, 2023 09:17
@tiborsimko
Copy link
Member

What do you think?

Thanks for the nice analysis. IMO the soluttion 3 will do for the problem at hand, but we should be ready to improve the behaviour during sandboxing sprint.

For example, I've heard a wish from Snakemake users who typically use the same container for all the steps that they did not really expect having to specify the container: clause for each rule in the Snakefile, because it's not really required when running Snakemake "locally". The users wished to specify a "base container" for the workflow that would be used for the execution of all the rules unless overridden. The current PR is actually very good for catching these must-have-container-clause situations early, but there are chances we may amend this later to allow specifying a base container for the workflow aiming at having 1:1 correspondence between Snakefile-outside-REANA and Snakefile-inside-REANA for these kind of usages. If we do that, then should we revisit the environment validation anyway.

Ideally, it would be nice to have different schemas for different workflow engines, or at least conditionals, so that we can express truly all the constraints, and cover other fields, not only environments. But this would be a bigger change that could probably go together with the sandboxing sprint...

@giuseppe-steduto giuseppe-steduto force-pushed the validate-fix-environment-misplaced branch 3 times, most recently from 269a6c5 to 4ea3b69 Compare November 14, 2023 23:48
@giuseppe-steduto giuseppe-steduto force-pushed the validate-fix-environment-misplaced branch 4 times, most recently from d804bcd to 827d309 Compare November 22, 2023 16:43
CHANGES.rst Outdated
@@ -4,7 +4,10 @@ Changes
Version 0.9.4 (UNRELEASED)
--------------------------

- Changes the REANA specification schema to use the ``draft 2020-12`` version of the JSON schema specification.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Changes the REANA specification schema to use the ``draft 2020-12`` version of the JSON schema specification.
- Changes the REANA specification schema to use the ``draft-07`` version of the JSON schema specification.

"type": "object",
"type": [
"object",
"null"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which cases resources can be null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never - restored it to just allowing objects

Comment on lines 251 to 250
"kubernetes",
"htcondor",
"htcondorcern",
"slurm",
"slurmcern"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think slurm and htcondor are allowed here. Did you find any places where they are used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put them there to be less restrictive towards other REANA deployments that might want to integrate HTCondor or Slurm, but I don't know how the integration with those computational backends would work if external to CERN. Do you think it is better to remove them (and also avoid that CERN users mistakenly write "htcondor" instead of "htcondorcern")?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and also avoid that CERN users mistakenly write "htcondor" instead of "htcondorcern"

Yes, exactly, I think we should avoid this so let's remove them as currently we only support kubernetes, htcondorcern and slurmcern. When we will add support for other Slurm/HTCondor instances we will update the schema accordingly!

for warning_value in value:
assert warning_value in warnings[key]
else:
assert operator.eq(value, warnings[key])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a curiosity, why we call operator.eq here and not ==?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relic of a previous version of the test in which I used map and passed operator.eq as a function. I've changed it to == to be more pythonic 😉

@giuseppe-steduto giuseppe-steduto force-pushed the validate-fix-environment-misplaced branch from 827d309 to bbb85a3 Compare November 28, 2023 11:10
Copy link
Member

@mdonadoni mdonadoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

The only thing that needs to be changed is the new alpha version.

@@ -14,4 +14,4 @@

from __future__ import absolute_import, print_function

__version__ = "0.9.4a2"
__version__ = "0.9.4a4"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's regenerate the release commit so that the new version is 0.9.4a3 and not 0.9.4a4

@giuseppe-steduto giuseppe-steduto force-pushed the validate-fix-environment-misplaced branch from bbb85a3 to 120559a Compare November 28, 2023 14:22
@mdonadoni mdonadoni merged commit 120559a into reanahub:master Nov 29, 2023
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

validate: detecting misplaced environment clause for Serial workflows
3 participants