Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reopen the discussion on stdlib jinja function #5053

Closed
wolfv opened this issue Nov 1, 2023 · 32 comments
Closed

Reopen the discussion on stdlib jinja function #5053

wolfv opened this issue Nov 1, 2023 · 32 comments
Labels
locked [bot] locked due to inactivity source::contributor created by a frequent contributor type::feature request for a new feature or capability

Comments

@wolfv
Copy link
Contributor

wolfv commented Nov 1, 2023

What is the idea?

We had a discussion at todays conda-forge meeting about the recently merged stdlib jinja function.

#4981

The current state is that we would write a migrator that would add this function to all recipes that have a {{ compiler('c') }} in their text, as most (maybe not all) packages would want that.

However, this seems a little "noisy" from a user perspective as all of a sudden they need to add another odd jinja function, and they need to add it almost unconditionally to all recipes that use the compiler.

For that reason I would like to propose a modification that extends the functionality of the compiler jinja function to add more than one dependency (this might need some hackery in conda-build, admittedly).

The idea would be that the compiler function would also read the additional variant config keys and have a extra keyword argument to control the behavior.

For example

- {{ compiler('c') }}  # uses the default `stdlib('c')`
- {{ compiler('c', stdlib='musl') }}  # uses `musl_stdlib`
- {{ compiler('c', stdlib=False }}  # disable stdlib injection

We also make this backwards compatible by ignoring any non-existent stdlib. E.g. if the conda_build_config.yaml doesn't define a c_stdlib we do not add anything.

The main problem is that we need to figure out if we can add two packages from a single Jinja function.

@wolfv wolfv added the type::feature request for a new feature or capability label Nov 1, 2023
@github-project-automation github-project-automation bot moved this to 🆕 New in 🧭 Planning Nov 1, 2023
@beckermr
Copy link
Contributor

beckermr commented Nov 2, 2023

cc @isuruf

@h-vetinari
Copy link
Contributor

I think the idea definitely has merit (usability, migration, etc.).

Though I'm not a big fan of the second line here:

- {{ compiler('c') }}  # uses the default `stdlib('c')`
- {{ compiler('c', stdlib='musl') }}  # uses `musl_stdlib`

If we can pull it off, I'd like to specify the stdlib implementation and version only in the CBC (not in the meta.yaml), such that

c_compiler:
  - gcc                         # [linux]
  - gcc                         # [linux]
c_compiler_version:
  - 12                          # [linux]
  - 13                          # [linux]
c_stdlib:
  - sysroot                     # [linux]
  - sysroot_musl                # [linux]
c_stdlib_version:
  - 2.17                        # [linux]
  - 1.2                         # [linux]

would be zipped and generate two builds on linux, e.g.

- {{ compiler('c') }}   -->     - gcc_linux-64 12.*
                                - sysroot_linux-64 2.17.*

                        -->     - gcc_linux-64 13.*
                                - sysroot_musl_linux-64 1.2.*

@beckermr
Copy link
Contributor

beckermr commented Nov 2, 2023

I think we might be conflating stdlib('c') with the standard lib for the c compiler versus glibc. I think we meant this to be glibc so that we'd also have stdlib('musl').

@wolfv
Copy link
Contributor Author

wolfv commented Nov 2, 2023

So basically we can keep the compiler function as-is and just need to figure out how to expand it into two actual dependencies?

Theoretically this could also be solved with more compiler packages, such as c_compiler: gcc_musl or c_compiler: gcc_glibc which might end up looking a bit like "target-triples"?

@beckermr
Copy link
Contributor

beckermr commented Nov 2, 2023

The compilers are independent of the stdlib. Coupling them is not needed.

@beckermr
Copy link
Contributor

beckermr commented Nov 2, 2023

Expanding to two lines is going to be problematic. Jinja does string substitution. If there are selectors applied to the line, what happens to those? Also we'd need to ensure that jinja has given us enough information to get the right indent. Finally we'd need to ensure that we know what kind of yaml structure we're in. We cannot always assume a list as a set of dashes on a new line versus an actually list with brackets.

@h-vetinari
Copy link
Contributor

I think we meant this to be glibc so that we'd also have stdlib('musl').

That was not what was discussed in #4981 (or what was implemented). The argument to stdlib is - like for the compiler jinja - a programming language like "c" / "cxx" / "fortran" etc.

We do mean it to include more than just the c stdlib (i.e. all the other bits that make up the sysroot), but the c library is arguably the driver there.

It's true that we don't strictly need to zip the compiler into two different stdlib builds, but as Wolf points out, each combination will be a unique target, so we might as well make it easy to, say, use a newer GCC when you're targeting musl.

Expanding to two lines is going to be problematic.

Even though it's not pretty, this can be worked around by introducing new metapackages that do the job. We already have to introduce them anyway - like macosx_deployment_target was introduced for use in stdlib("c"), so it wouldn't surprise me that a different jinja set-up might lead to different metapackages.

It would be a mess naming wise, and we'd have a matrix of compiler_version X stdlib_version to build, so I'd hope we could could find a way to teach jinja to create two lines.

Speaking of which, a technically possible but thoroughly unappealing approach for that would be to have yet another idiosyncrasy in conda-build, of preprocessing the file to duplicate all lines that contain {{ compiler("c") }}, regardless of line context. E.g. turning

-  {{ compiler("c") }}  # [some_selector]

into

-  {{ compiler("c", internal_switch_compiler_xor_stdlib=True) }}  # [some_selector]
-  {{ compiler("c", internal_switch_compiler_xor_stdlib=False) }}  # [some_selector]

and then run Jinja.

I guess it's a bit late to claim that such horrors are just to scare people a bit for Halloween, but in any case while we're brainstorming... 🎃😜

@h-vetinari
Copy link
Contributor

This would probably get easier in the new recipe format, as there we don't have semantically relevant comments anymore (plus some merge logic if the indents don't align, IIUC), so we might get away with substituting in a \n there.

@beckermr
Copy link
Contributor

beckermr commented Nov 2, 2023

Ahhhh I misremembered how we did the stdlibs then @h-vetinari. Sorry!

Looking at the ideas above, all of them are worse IMHO than simply have people carry an extra line for the stdlib. Also, recipes are a mystery to the vast majority of conda users anyways IMHO.

I personally do not like having to maintain a manual set of pairs of compilers and stdlibs together. The whole point of the conda build config is to provide a structured way to mix and match things.

@h-vetinari
Copy link
Contributor

Yeah, I'm not a fan of the ideas I spelled out either 😅

Just thought I'd write down my thoughts about how one might shove this square peg into that round hole.

I mean, I see the issues with migration, and user confusion, but OTOH, the original design has some substantial benefits as well, especially through being more explicit & less magic:

  • It's a fact of life that compilers generate a dependence on a given stdlib, and for now this is all hidden away in our infra
  • making that dependence explicit in the recipe makes that part of the infrastructure more homogeneous, discoverable(!) and understandable IMO, as well as much easier to configure
  • we have one nicely declarative place (CBC) where the implementation and version of both compiler and its stdlib are set, without having to touch the recipe.

@h-vetinari
Copy link
Contributor

@wolfv can you comment on the points brought up here?

The currently available options all seem unpalatable:

  • expanding to two lines will be fragile with selectors
  • a separate pass to duplicate the respective lines before jinja substitution sounds like something no-one would want
  • metapackages that combine compiler and stdlib would create a lot of overhead

In general, I tend to agree with @beckermr's point that it's better to make the stdlib explicitly declared. So far the stdlib was a silent assumption in our infrastructure, and surfacing that is going to lead to some churn, but IMO I think it's the better choice than a silent default, especially as there doesn't seem to be a good way to implement it.

@wolfv
Copy link
Contributor Author

wolfv commented Nov 15, 2023

I made a little implementation of the idea of expanding a single jinja function into multiple values. This is done by flattening a list-of-lists for the requirements.

#5072

The most breaking change is that selectors without hash don't work anymore, e.g. this was actually legal:

- bla [linux]

I've tested this so far on a single-output recipe and seems to work (including with your configuration of stdlib's in the conda_build_config.yaml).

I agree that it's modestly hacky, but it doesn't really make things worse as they are already pretty dire.

@wolfv
Copy link
Contributor Author

wolfv commented Nov 15, 2023

Also there is a bug in your implementation – I don't think it's currently added to the "used_variables" and doesn't appear in the hash computation.

https://github.com/wolfv/conda-build/blob/8e8ff69b739cc53af1d7fa512f79c7c2a04e6e5a/conda_build/variants.py#L725

I need to reiterate how crazy the conda-build code base is ... :headexplode:

@wolfv
Copy link
Contributor Author

wolfv commented Nov 15, 2023

Let me reiterate my points:

  • It seems unclear how often people would diverge from stdlib('c') when they use the compiler. It sounds like almost never (they would rather change the stdlib in the conda_build_config.yaml instead, to read c_stdlib: musl or something like that).
  • We force people to spell out both {{ compiler('c') }} and {{ stdlib('c' }} on basically all recipes that depend on a compiler. So it is just introducing more noise for the end user because of a technical limitation on our end.
  • With my proposal the end user would continue to be able to control precisely the stdlib through the conda_build_config.yaml which seems to be the suggested way anyways. {{ stdlib('musl') }} is not a thing.

To me, this still seems like a fix that is mainly done because of some technical limitation that we can solve more elegantly. I don't think many users will understand the stdlib function. I really disagree with the statement that "recipes are a mystery to the vast majority of conda users anyways IMHO" - they should not be and we're failing our users here if we continue to make it harder.

With my proposal, selectors are absolutely not a problem at all, they work exactly the same way as they used to (remove the line with the {{ compiler('c') }}, the jinja won't be evaluated at all because the selectors run earlier.

The indentation stuff is also not a concern at all, because we use the alternate syntax for lists - [c_compiler, c_stdlib].

The stdlib remains "as configurable" as it is with your current implementation. The users who care about it will figure it out wether it's written in the recipe or not (because we would write some docs about it and it would surface in the conda-forge pinnings.

@beckermr
Copy link
Contributor

Nice work @wolfv!

This implementation will break any usage of the compiler jinja2 function that is not embedded in a list. IDK if that ever happens of course.

@wolfv
Copy link
Contributor Author

wolfv commented Nov 15, 2023

Everything will be better with rattler-build :) We have a special list of things that can only appear in the "dependency" lists (run, host, build, run_constrained, and the run_exports).

I don't think anybody uses the compiler jinja function outside of dependencies (or should do that).

@beckermr
Copy link
Contributor

Awesome! I am looking forward to the new format and tooling!

IDK how to add to the variables used list in conda build. @h-vetinari @isuruf do you happen to know how to do this or can provide me with some guidance?

BTW @wolfv, can you cut a release of boa (mamba-org/boa#381)?

@isuruf
Copy link
Contributor

isuruf commented Nov 15, 2023

Introducing new features in recipe yaml (lists of lists for requirements) like this for a simple function like this is the definition of technical debt and as you noticed in conda-build code base, it is full of technical debt. I'm not sure why you would want to do this given all your work to reduce technical debt.

@carterbox
Copy link
Contributor

It seems unclear how often people would diverge from stdlib('c') when they use the compiler. It sounds like almost never (they would rather change the stdlib in the conda_build_config.yaml instead, to read c_stdlib: musl or something like that).

While most C programs depend on the C standard library, I'm not sure it is also true that most programs using the C++ compilers use the C++ standard libraries. I often see warnings from conda-build from recipes which use the C++ compiler about overdepending on the C++ standard library. I think if we defocus from the C language and consider the other compilers, the benefit of requiring explicit dependence on standard libraries becomes clearer: less packages will be depending on unused standard libraries. The current method for avoiding unused dependencies requires declaring ignore_run_exports_from, but since most users don't read the build logs, they don't add ignore_run_exports_from: {{ compiler('cxx') }}. I think it's easier to explain to users "You are using the standard library, so list it in the requirements" than it is to explain "You are not using the standard library, so remove it from the requirements".

I would also like to echo Matt's comment about how compilers and standard libraries are separate entities, so it makes sense to list them as separate dependencies.

@beckermr
Copy link
Contributor

Also this

ignore_run_exports_from: {{ compiler('cxx') }}

may be a spot where the list expansion becomes an issue?

@baszalmstra
Copy link

My 2¢.

My main focus is user experience. I understand (and firmly believe in) the argument to always be explicit but I believe we can take it a step further by incorporating sensible defaults and providing clear error messages that guide users on when and how to be explicit. Even though recipes can be a bit mysterious, I think we should take the user by the hand rather than adding more magic to the process.

In that light, I'm wondering what happens if you do not specify {{ stdlib('c' }} whilst actually depending on the standard library. Will that result in an error, or will that "just work"? If the latter is the case I think it will be challenging to teach people to be explicit about adding {{ stdlib('c' }} to their recipe.

I'm in favor of keeping the recipe straightforward, using sensible defaults, and minimizing unnecessary details. Therefore, I suggest not requiring the specification of the c standard library unless someone explicitly wants to use a different one (or none at all). For other languages where specifying the standard library is essential for successful compilation, it makes sense to explicitly mention {{ stdlib(LANG) }} which I think you would have to already do in the first place.

@h-vetinari
Copy link
Contributor

h-vetinari commented Nov 16, 2023

I'm in favor of keeping the recipe straightforward, using sensible defaults, and minimizing unnecessary details.

If we assume this approach, how would this work (or how would the recipe need to be adapted) - as an eminently possible future use - to build linux against both glibc and musl?

In the {{ stdlib(...) }} case, there would be no changes in meta.yaml, and the cbc would have:

c_stdlib:
  - sysroot                     # [linux]
  - sysroot_musl                # [linux]
c_stdlib_version:
  - 2.17                        # [linux]
  - 1.2                         # [linux]

# this would be set in the global pinning
zip_keys:
  - - c_stdlib
    - c_stdlib_version
    - # rest of the zip

Not that this isn't the be-all use case, but my goal here is to figure out that going with the "sensible defaults" approach does not close the door to reasonable future use cases, which would be a sign that it's not the right long-term design (and worst case we'd have to change it again in a few years).

@baszalmstra
Copy link

In the {{ stdlib }} case, there would be no changes in meta.yaml, and the cbc would have:

I might have misunderstood, it was my understanding that we would have to add {{ stdlib }} to every (current and future) recipe that has {{ compiler(c) }} ?

I'm not entirely sure about the technical details but what I am advocating for is that ideally one would only have to add {{ stdlib }} and/or something to the conda_build_config.yml when you want to target something other than glibc (being the default).

@wolfv
Copy link
Contributor Author

wolfv commented Nov 16, 2023

Actually, in terms of technical details, my approach would also add {lang}_stdlib to the used variables and therefore expanding the matrix of builds. You could build against multiple different stdlibs based on the cbc.yaml and obtain different packages (with different hashes).

That actually currently does not work for your implementation if I read it right (becasue you missed that part with the jinja regexes).

@h-vetinari
Copy link
Contributor

h-vetinari commented Nov 16, 2023

That actually currently does not work for your implementation if I read it right (becasue you missed that part with the jinja regexes).

Sure, that's a bug in the implementation, but not a fundamental stumbling block for anything.

Actually, in terms of technical details, my approach would also add {lang}_stdlib to the used variables and therefore expanding the matrix of builds

What would happen in that setup if just the compiler without the stdlib is requested?

And user-friendliness aside1, would it work to use ignore_run_exports[_from]: for the whole compiler, resp. just the stdlib?

Footnotes

  1. having to ignore a default run-export, unless someone teaches the maintainers about adding stdlib=None to the compiler. That would also make a transition harder, as we wouldn't have a good tool like error_overlinking: true to pair it with.

@h-vetinari
Copy link
Contributor

Ping @wolfv

@h-vetinari
Copy link
Contributor

#4999 has been released in conda-build 3.28 ~two weeks ago - the window for large overhauls here is starting to close I think.

Aside from the open points about injecting two dependencies (and how to deal with ignore_run_exports), I think my apprehension boils down to how turning stdlib into a kwarg creates an ugly coupling between unrelated things and would be inconsistent in several ways.

For example, in

requirements:
  build:
    - {{ compiler("c") }}

there'd then be an implicit magic host dep under "build" (which then has to be corrected with a strong run-export). We don't really have an option to make it explicit under the kwarg-proposal, as then we'd lose the abstraction between platforms and end up with something like:

requirements:
  build:
    - {{ compiler("c", stdlib="sysroot") }}                     # [linux]
    - {{ compiler("c", stdlib="macosx_deployment_target") }}    # [osx]

This layering confusion between build and host is also problematic for cross-compilation, obviously. On top of that, the question arises how stdlib versions are specified in a case when non-standard library implementation are used. For example, how would I pass the correct version to {{ compiler("c", stdlib="musl") }}? (If the answer is CBC: how is that consistent with silent defaults for other implementations1?)

Aside from picking up unrelated (by name) keys from CBC, it would then also appear to be a consistent choice2 to use

c_stdlib:
  - False

for indicating {{ compiler("c", stdlib=False) }}, i.e. that a given recipe does not rely on the stdlib. This would IMO lead to yet more special-casing, because now we need to interpret a bunch of strings (what about false? None? True?) with special "I'm not an output" meaning.

Finally, if we do something like removing libcxx from being a default run-export of clangxx (which would be very nice because the coupling always creates problems around compiler version bumps, especially on osx-arm), we'd end up with a probably comparable amount of churn to do:

 requirements:
   build:
-    - {{ compiler("cxx") }}
+    - {{ compiler("cxx", stdlib=False) }}
   host:
     - libfoo

for all the feedstocks that don't actually need libcxx, instead of doing

 requirements:
   build:
     - {{ compiler("cxx") }}
   host:
+    - {{ stdlib("cxx") }}
     - libfoo

for those that do. We also cannot solve this by defaulting to stdlib=False for "cxx", because that would be inconsistent with stdlib=<something> for "c", aside from being back in selectors-required-land, further increasing the churn (and cognitive load in a recipe) compared to just adding {{ stdlib("cxx") }} to packages that need it.

In summary, I understand the desire to avoid such an amount of churn for this, but it seems to me that the alternative is more of a churn-avoidance bandaid rather than a coherent design, and long term, that seems like a trade-off we'll come to regret.

Footnotes

  1. I.e. I need to respecify all stdlib implementations and version to respect zip-coupling coming from the global pinning...

  2. ignoring the zip with c_stdlib_version for now

@carterbox
Copy link
Contributor

We can avoid package churn by not forcing a migration. We could add a lint that warns any recipe using the compiler template, and let the packages transition as they are rebuilt naturally.

@h-vetinari
Copy link
Contributor

We can avoid package churn

The way I understood @wolfv's concern was about avoiding "recipe churn" first and foremost. Package churn is kinda unavoidable, but I agree it can be stretched out over time a bit.

@h-vetinari
Copy link
Contributor

Hey @wolfv and happy new year!

Can we move this topic forward?

@wolfv
Copy link
Contributor Author

wolfv commented Jan 22, 2024

Sorry for being slow!

I don't want to block this any further. I do think that for the future (ie. rattler-build) we'll have an upcoming CEP to define this feature better. It won't be an issue to move the existing recipes to the new standard, though, so all is fine from my side.

@wolfv wolfv closed this as completed Jan 22, 2024
@github-project-automation github-project-automation bot moved this from 🆕 New to 🏁 Done in 🧭 Planning Jan 22, 2024
@h-vetinari
Copy link
Contributor

For those interested, some more thoughts on this are being shared in prefix-dev/rattler-build#239

@github-actions github-actions bot added the locked [bot] locked due to inactivity label Aug 28, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked [bot] locked due to inactivity source::contributor created by a frequent contributor type::feature request for a new feature or capability
Projects
Archived in project
Development

No branches or pull requests

7 participants