Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea for better (non-redundant, more compact) branching syntax for selectors? #1155

Open
h-vetinari opened this issue Nov 2, 2024 · 20 comments

Comments

@h-vetinari
Copy link

h-vetinari commented Nov 2, 2024

While thinking about #1154, it occurred to me that it would be even nicer if we could do a match-style syntax that doesn't have to choose a single branch but could apply several!

        files:
          - match: linux
              - something_linux
            match: osx
              - something_osx
            match: unix         # e.g. on linux, this should be taken
              - something_unix  # _in addition_ to the linux-only case
            match: win
              - something_win
            match: unix or win
              - something_that_applies_to_everything
            # no more `else:`, only explicit matches!

I guess it might be a bit late for such deep syntactic changes, but IMO this would make it much nicer to write recipes, rather than having to constantly restart if: branches to avoid exhaustivity-problems.

It would allow writing using a single match: (or whatever you want to call it, like the old sel: before it turned into if:) block for every section of the recipe, which would be much nicer IMO if we're looking at translating things like

examples

this

    - cython                              # [build_platform != target_platform]
    - vtk =*=osmesa_*                     # [build_platform != target_platform and build_variant == "noqt"]
    - vtk =*=qt_*                         # [build_platform != target_platform and build_variant != "noqt"]
    - qt-main                             # [build_platform != target_platform and build_variant == "pyqt"]
    - qt6-main                            # [build_platform != target_platform and build_variant == "pyside6"]

or this

    - crossenv                            # [build_platform != target_platform]
    - maturin >=1.3.2,<2                  # [build_platform != target_platform]
    - {{ stdlib("c") }}
    - {{ compiler('c') }}
    - {{ compiler('cxx') }}               # [win]
    - {{ compiler('rust') }}
    - posix                               # [build_platform == "win-64"]
    - cmake
    - make                                # [unix]
    - cargo-feature                       # [build_platform != target_platform and target_platform == "win-64"]

etc.

The more dimensions (unix vs win, native vs. cross, python version, CUDA or not, implementation variants for openmpi/BLAS/QT/etc.) we need to cover in a given recipe, the harder it gets to do that with simple if: else: branches. Consider something like

        build:
          - match: build_platform != target_platform
              # cross-compilation
            match: build_platform != target_platform and cuda_version
              # cross-compiled CUDA
            match: match(cuda, "<12")
              # CUDA 11.x
            match: match(cuda, ">=12.0")
              # CUDA 12.x
            match: linux
              # linux
            match: osx
              # osx
            match: unix
              # unix
            match: win
              # win
            match: blas_impl == "mkl"
              # mkl

It would be a much lower mental overhead to be able to freely shuffle around lines based on which sections they apply to (e.g. not having to restructure the following into correctly (non-)overlapping if: statements), only adding lines to the selector where they are supposed to be applied.

@wolfv
Copy link
Member

wolfv commented Nov 2, 2024

It's completely impossible because duplicate keys are forbidden :(

@h-vetinari
Copy link
Author

It's completely impossible because duplicate keys are forbidden :(

Well, how about making each - match: a separate list entry? That's also how multiple if:'s work IIUC.

In other words, the following (note the extra - everywhere)

        build:
          - match: build_platform != target_platform
              # cross-compilation
          - match: build_platform != target_platform and cuda_version
              # cross-compiled CUDA
          - match: match(cuda, "<12")
              # CUDA 11.x
          - match: match(cuda, ">=12.0")
              # CUDA 12.x
          - match: linux
              # linux
          - match: osx
              # osx
          - match: unix
              # unix
          - match: win
              # win
          - match: blas_impl == "mkl"
              # mkl

... should be desugared to the same list as the impossible variant that has multiple match:. The only thing the desugaring would need to learn is to ignore empty lists (for a match: that doesn't apply), though I guess that's already covered by the existing infra (e.g. for an if: without else:)

@baszalmstra
Copy link
Contributor

Why not use a top level match with nested cases?

@h-vetinari
Copy link
Author

Why not use a top level match with nested cases?

I'd be happy with that too!

@wolfv
Copy link
Member

wolfv commented Nov 2, 2024

But how is it different from multiple if statements?

@baszalmstra
Copy link
Contributor

You dont need to repeat the negation of the if clauses.

You can more easily write something specific for win, macos, unix without having to repeat not win, etc.

Sorry, im on a phone so its a bit hard to type a proper example.

@wolfv
Copy link
Member

wolfv commented Nov 2, 2024

For the record, it's actually possible to use arbitrary Jinja in multi-line YAML strings. I am not sure if we want to keep it, but the following works (after a small bug fix from #1156):

build:
  script: |
    echo "Is ${{ target_platform == "linux-64" }}"
    {% if target_platform == "linux-64" %}
      echo "I AM A LINUX"
    {% elif osx %}
      echo "I AM A MAC"
    {% endif %}

And of course, you could just as well use bash.

So I am not sure if it's worth it to make our syntax more complicated or not ...

Right now it's also possible to do arbitrary things with set and for loops btw. - it just has to be encapsulated in a multi-line string :)

@baszalmstra
Copy link
Contributor

One of the design goals was to make the recipe easier to parse for machines. I think what you are suggesting is taking a step back. Having a match-kinda statement seems like a great (and logical) building block.

@wolfv
Copy link
Member

wolfv commented Nov 2, 2024

Yeah but we've never semantically understood the contents of the scripts themselves so I'm not sure if we need that. This doesn't affect the "everything is pure YAML" property.

@baszalmstra
Copy link
Contributor

Sure but just as you mentioned in another issue you just closed, we can validate the control flow statements using the schema. That would become harder and harder the more jinja we start using.

@wolfv
Copy link
Member

wolfv commented Nov 2, 2024

If we build our own LSP though .. :) I think I do get your point, but I don't see it super different from e.g. people having issues with their bash scripts inside the script tag.

@MementoRC
Copy link

MementoRC commented Nov 24, 2024

For the record, it's actually possible to use arbitrary Jinja in multi-line YAML strings. I am not sure if we want to keep it, but the following works (after a small bug fix from #1156):

build:
script: |
echo "Is ${{ target_platform == "linux-64" }}"
{% if target_platform == "linux-64" %}
echo "I AM A LINUX"
{% elif osx %}
echo "I AM A MAC"
{% endif %}
And of course, you could just as well use bash.

So I am not sure if it's worth it to make our syntax more complicated or not ...

Right now it's also possible to do arbitrary things with set and for loops btw. - it just has to be encapsulated in a multi-line string :)

This does not seem to work:

          {% for bin in qemu_bins %}
            test -f ${QEMU_LD_PREFIX}/{{ bin }}
          {% endfor %}
 │ │ │ + test -f $PREFIX/aarch64-conda-linux-gnu/sysroot/lib64/ld-2.17.so
 │ │ │ + test -f '$PREFIX/aarch64-conda-linux-gnu/sysroot/lib64/{{' bin '}}'
 │ │ │ $PREFIX/etc/conda/test-files/qemu-user-execve-aarch64/0/conda_build.sh: line 14: test: too many arguments

The issue with passing to a bash -c ... is that ${{ xxx }} expands as ["...", "...",] which trips-up bash as it is expecting arrays to be more like (xxx, xxx, ), so this also fails:
bash -c 'l=${{ qemu_bins | split(" ") | list }}; for bin in "\${l[@]}"; do test -f ${QEMU_LD_PREFIX}/${bin}; done'

@wolfv
Copy link
Member

wolfv commented Nov 24, 2024

Where did you set qemu_bins? The context has a behvior where it casts everything to a string.

But if you do:

          {% set qemu_bins = ["foo", "bar"] %}
          {% for bin in qemu_bins %}
            test -f ${QEMU_LD_PREFIX}/{{ bin }}
          {% endfor %}

It has a chance of working :)

@h-vetinari
Copy link
Author

Why not use a top level match with nested cases?

Actually, one reason why I do have a preference for bare match:, is that match: - case ... -case suggests disjoint conditions again (at least based on how pattern matching works in most languages), whereas I'd like to not match one specific thing to various disjoint cases, but rather freely match conditions for different quantities that may overlap. Using a different syntax there would avoid triggering some unhelpful muscle memory.

@wolfv
Copy link
Member

wolfv commented Nov 24, 2024

@h-vetinari sorry, not following. What's the difference of match then to a number of if clauses if the match' clauses are disjoint? I think what we had in mind so far would be a match that breaks after it finds the first matching entry, e.g.

match:
  - case: unix
    then: 
      - bla
  - case: win and arm64
    then: 
      - foobar
  - case: win
    then:
      - foobar-win64 # NOT happening on arm64

@wolfv
Copy link
Member

wolfv commented Nov 24, 2024

Also your initial example is not at all valid YAML if I see that correctly.

@wolfv
Copy link
Member

wolfv commented Nov 24, 2024

Attached two screenshots highlighting the issues (from https://yamlchecker.com/)

Image
Image

@h-vetinari
Copy link
Author

I think what we had in mind so far would be a match that breaks after it finds the first matching entry, e.g.

I explained why that's not desirable in the OP (or at least that there are cases where branches shouldn't be exclusive).

Also your initial example is not at all valid YAML if I see that correctly.

Yes, I hate YAML. 🙃

@wolfv
Copy link
Member

wolfv commented Nov 24, 2024

The way I understand it is that you want "if"-syntax without then, correct?

You want falsey branches to be skipped, but every truthy branch to be taken, right?

In e.g. Rust, when usign match, only the first truthy branch is taken. In C/C++ you would have to add a break for this behavior. In Python also only the first truthy value is matched.

But if I understand your proposal correctly, you want every truthy branch to "match"?

Can you add corrected YAML to your proposal? Because I think if you write out proper YAML, it will look just like the current if statements.

@MementoRC
Copy link

MementoRC commented Nov 24, 2024

It has a chance of working :)

Right! Silly me, I forgot I needed to split (I did know that, just got caught in the Jinja mixin). The variable is in the context section, it is a ${{ [x, y, z] | join() }}

For this particular case, it seems that the rattler/jinja is more complex. The array needs to be joined, then split in script where the {%%} is accepted. Am I missing something?

Also, it seems that the comments are not completely ignored, I had mistakenly put a "xx' in a comment and it seemed to generate a syntax error, that went away after correction (not 100% sure, though)

Update: Not to be difficult, but the initial point was to have the list be defined once for the several packages that use it. I don't think it will work:
{% for bin in {{ qemu_bins | split(' ') | list }} %} fails to be parsed and a global {% fails with:

Error:   × Parsing: failed to parse YAML: unexpected character: `%'
   ╭─[1:2]
 1 │ {% qemu_bins_j = [
   ·  ────────┬────────
   ·          ╰── here
 2 │       "bios.bin", "bios-256k.bin", "bios-microvm.bin", "qboot.rom", "vgabios.bin", "vgabios-cirrus.bin"
   ╰────

So I will use python -c ... ${{ qemu_bins }}.split() ..., which is neat enough I concede

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants