Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write CEP about virtual packages #103

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
171 changes: 171 additions & 0 deletions cep-????.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
# CEP ???? - Virtual packages

<table>
<tr><td> Title </td><td> Virtual packages </td>
<tr><td> Status </td><td> Draft </td></tr>
<tr><td> Author(s) </td><td> Jaime Rodríguez-Guerra &lt;[email protected]&gt;</td></tr>
<tr><td> Created </td><td> Dec 17, 2024</td></tr>
<tr><td> Updated </td><td> Dec 17, 2024</td></tr>
<tr><td> Discussion </td><td> https://github.com/conda/ceps/pull/103 </td></tr>
<tr><td> Implementation </td><td> https://github.com/conda/conda/tree/24.11.1/conda/plugins/virtual_packages, https://github.com/mamba-org/mamba/blob/libmamba-2.0.5/libmamba/src/core/virtual_packages.cpp, https://github.com/conda/rattler/tree/rattler-v0.28.8/crates/rattler_virtual_packages/src </td></tr>
</table>

> The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT",
"RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as
described in [RFC2119][RFC2119] when, and only when, they appear in all capitals, as shown here.

## Abstract

This CEP standardizes which virtual packages MUST be offered by conda install tools.

## Motivation

Virtual packages are used to expose details of the system configuration to a conda client. They are commonly used as dependencies in regular packages to constrain on which systems they can be installed. Some examples include:

* On Linux, the minimum GNU `libc` version that must be available in the system via the `__glibc` virtual package.
* The oldest macOS version compatible with the package via the `__osx` virtual package.
* Whether a `noarch` package should be constrained to a single operating system via the `__linux`, `__osx` or `__win` virtual packages (often with no version constraint).
* The minimum CPU microarchitecture level that the binaries require via the `__archspec` virtual package.
* The lowest CUDA version the GPU driver is compatible with via `__cuda`.

## Specification

A virtual package is defined as a package record with three fields: name, version and build string.
The name MUST start with double underscore (`__`). The version and build string MUST follow the same semantics as in regular package records. More specifically, the version field MUST follow the version string specifications, regardless its origin (computed from a system property, overridden by the user or configuration, or provided by default by the tool).

Some general considerations:

jaimergp marked this conversation as resolved.
Show resolved Hide resolved
- The version or build string of a virtual package MAY be overridden by the value of `CONDA_OVERRIDE_{NAME}` environment variable, with `{NAME}` being the uppercased name of the virtual package (excluding the leading underscores). Many exceptions apply so please observe the details in the section below.
- The build string MAY be zero (`0`). Some exceptions apply. See below.
- When the tool used a fallback default value instead of a computed one, it SHOULD also inform the user of that choice and its possible override options (e.g. `CONDA_OVERRIDE_{NAME}` variables, CLI flags, configuration file, etc).

### List of virtual packages

In alphabetical order, every conda client MUST support the following virtual packages:

- `__archspec`
- `__cuda`
- `__glibc`
- `__linux`
- `__osx`
- `__unix`
- `__win`

#### `__archspec`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The microarch-level packages in conda-forge depend on __archspec to provide microarchitecture-level (e.g. x86-64-v2) meta-packages.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we aware of any existing packages that actually depend on __archspec?

If not, I would rather we not make this virtual package mandatory at this time, mostly because "what microarchitecture is this CPU?" is a question that can get complicated quickly; see, e.g., ARM big.LITTLE, Intel P/E cores, Intel Xeon 6, etc. To me, this virtual package is still (pseudo-)experimental, in the sense that we still need to work out how package maintainers should/want to use this package. IMO, __archspec should be its own CEP or set of CEPs (cf. #59).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


This virtual package MUST be always present, with the version set to `1`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im wondering why this must always be present. For some target platform it doesnt seem to make sense (any platform where the archspec should be reported as 0 basically). Shouldn't we instead just omit the archspec in that case?


The build string MUST reflect one of:

- If the target platform matches the native platform, the best fitting CPU microarchitecture in the [`archspec/archspec-json` database](https://github.com/archspec/archspec-json/blob/v0.2.5/cpu/microarchitectures.json). The reference CPU detection implementation is [`archspec.cpu.detect.host()`](https://github.com/archspec/archspec/blob/v0.2.5/archspec/cpu/detect.py#L338).

- The target platform architecture (second component of the platform string), mapped as:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An interesting observation from this table is that there are a number of entries here that are not present in the archspec database (armv6l, armv7l, s390x, arm64) and are thus not actually existing microarchitectures. I guess we can make something up. But for arm64 I think it would be more appropriate to use aarch64 instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For rattler we have the following behavior:

win-arm64 -> aarch64
osx-arm64 -> m1

I think this makes sense because these are actual existing microarchitectures and they are also both the lowest possible supported microarchitectures on these platforms.


| Target platform | Reported `archspec` build string |
| --------------- | -------------------------------- |
| `*-32` | `x86` |
| `*-64` | `x86_64` |
| `*-armv6l` | `armv6l` |
| `*-armv7l` | `armv7l` |
| `*-aarch64` | `aarch64` |
| `*-arm64` | `arm64` |
| `*-ppc64` | `ppc64` |
| `*-ppc64le` | `ppc64le` |
| `*-riscv64` | `riscv64` |
| `*-s390x` | `s390x` |
| `zos-z` | `0` |
| Any other value | `0` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should riscv32 also be added?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add wasm32?


The build string MUST be overridable with the `CONDA_OVERRIDE_ARCHSPEC` environment variable, if set to a non-empty value.
jakirkham marked this conversation as resolved.
Show resolved Hide resolved
baszalmstra marked this conversation as resolved.
Show resolved Hide resolved

#### `__cuda`

This virtual package MUST be present when the system exhibits GPU drivers compatible with the CUDA runtimes. When available, the version value MUST be set to the oldest CUDA version supported by the detected drivers (i.e. the formatted value of `libcuda.cuDriverGetVersion()`), constrained to the first two components (major and minor) and formatted as `{major}.{minor}`. The build string MUST be `0`.

Comment on lines +83 to +84
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cuDriverGetVersion returns the version of the installed driver, not the oldest CUDA version supported by the detected drivers. Due to backwards compatibility support newer drivers also support older versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this returns the CUDA version as seen by the driver. The docs are not super clear. There's also cudaRuntimeGetVersionAt least this is what conda/conda uses.

@jakirkham could you help clarify this? Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think the driver CUDA version might not be the "oldest CUDA version supported by the detected driver". E.g. If my driver is 12.6 it might also support older CUDA versions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asking offline to see if others have thoughts here

That said, here are some rough thoughts. Starting with some background

When the CUDA Toolkit is typically shipped (like in the installer), it ships with a full suite of things including the driver library. However in Conda, we ship pretty much everything else except the driver library. So with Conda, the user is on the hook for ensuring the driver library is installed, which would happen when they install the driver

So this raises a question. How does Conda decide what version of the driver is compatible with a particular CUDA Toolkit?

This is what __cuda is solving. It is giving us the driver library's associated CUDA Toolkit version. IOW if that driver library was shipped in an installer, this is that installer's CUDA Toolkit version

So now when we try to install the CUDA Toolkit libraries, we can answer the question, will the driver be able to handle these?

Originally the answer was the CUDA Toolkit libraries we install could be no newer than the CUDA Toolkit the driver came from. We had checked this down to the major minor version (patch version was allowed to float). So if this driver check reported CUDA 10.1, we could only use CUDA Toolkit libraries from 10.1 or older. Eventually we relaxed this to major only for 11.2+ and 12

So really __cuda's version is the upper bound on the libraries we can use

Sorry for the long winded answer. Though hopefully the additional details provide context. Perhaps this can be incorporated into the text above somehow (after some discussion and Q&A)

The version MUST be overridable with the `CONDA_OVERRIDE_CUDA` environment variable, if set to a non-empty value that can be parsed as a version string.

#### `__glibc`

This virtual package MUST NOT be present if the target platform is not `linux-*`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"...if the target platform's C standard library is not the GNU C Library". Not sure if we codify the assumption of all Linux systems inherently provide GNU libc or that only Linux uses GNU libc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current conda implementation assumes the latter "only Linux uses GNU libc". See https://github.com/conda/conda/blob/e74a2b9d8a74837afc3bcbef609fe4bd29572e16/conda/plugins/virtual_packages/linux.py#L14-L16.

If that's incorrect, (1) we need to fix that in conda/conda, and (2) I will update this paragraph.


This virtual package MUST be present when the native and target platforms are both the same type of `linux-*` and GNU `libc` is installed in the system. The version value MUST be set to the system GNU `libc` version, constrained to the first two components (major and minor) formatted as `{major}.{minor}`. If the version cannot be estimated, the tool MUST set the version to a default value (e.g. `2.17`).
jjhelmus marked this conversation as resolved.
Show resolved Hide resolved

If the native platform does not match the target platform, the tool MAY export `__glibc` with its `version` field set to a default value (e.g. `2.17`) of its choice.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we provide some guidance here on how the default value should be selected? If we don't, I could see situations arising where two different conda install tools or two different versions of the same install tool provide different default values on the same system.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rattler also skips the virtual package if a libc implementation cannot be found.


If the `CONDA_OVERRIDE_GLIBC` environment variable if set to a non-empty value that complies to the version string specification, the tool MUST export `__glibc` with its version value set to the value of the environemnt variable.

The build string MUST always be `0`.


> The GNU `libc` version can be computed via:
>
> - Python's `os.confstr("CS_GNU_LIBC_VERSION")`
> - `getconf GNU_LIBC_VERSION`
> - `ldd --version`. Please verify that it references GNU `libc` or GLIBC. For non-standard installs, using a GLIBC compatibility layer, this may require locating the implementation and directly querying.

#### `__linux`

This virtual package MUST be present when the target platform is `linux-*`. Its version value MUST be set to the Linux kernel version, constrained to the first two to four numeric components formatted as `{major}.{minor}.{micro}.{patch}`. If the version cannot be estimated (e.g. because the native platform is not Linux), the tool MUST set `version` to a fallback value of its choice. The build string MUST be `0`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Linux kernel upstream only defines {major}.{minor}.{micro}; anything beyond that (including the {patch} component you wrote) is part of the distribution kernel's version string. I don't think conda tooling should expose those components since their semantics of those will vary from distribution to distributon; e.g., patch 42 on Fedora may differ from patch 42 on Ubuntu.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this contradicts this comment in conda/conda. I'm happy to trim to three fields, but we'll need a source.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ha! Since Linux 3.0, "[mainline] kernels with 3-part versions are -stable kernels"; however, the mainline 2.6 series had 4-part version numbers. The existing "3- or 4-part" comment in conda/conda comes from the fact that 2.6 is the upstream for the CentOS/RHEL 6 kernel, which was actively supported by Anaconda and conda-forge at the time I wrote that patch.

In light of that, I suggest replacing:

Its version value MUST be set to the Linux kernel version, constrained to the first two to four numeric components formatted as {major}.{minor}.{micro}.{patch}.

with something like:

Its version value MUST be set to the upstream (AKA mainline) Linux kernel version, but it MUST exclude any and all distribution-specific components of the kernel version.

I think that captures the intent of my previous comment, which was to ensure:

  1. we don't allow conda packages to depend on [Linux] distribution-specific kernels via the __linux virtual package; and
  2. we don't have to deal with the various ways Linux distros version their kernels (which may or may not be compatible with how conda does version ordering)

(That said, my suggested language does allow for non-release/development/-rcN kernels, but I suspect that user base is relatively small and able to cope if something goes horribly wrong.)


The version MUST be overridable with the `CONDA_OVERRIDE_LINUX` environment variable, if set to a non-empty value that matches the regex `"\d+\.\d+(\.\d+)?(\.\d+)?"`. The environment variable MUST be ignored when the target platform is not `linux-*`.

> The Linux kernel version can be obtained via:
>
> - Python's `platform.release()`
> - `uname -r`
> - `cat /proc/version`

#### `__osx`

This virtual package MUST be present when the target platform is `osx-*`. Its version value MUST be set to the first two numeric components of macOS version formatted as `{major}.{minor}`. If applicable, the `SYSTEM_VERSION_COMPAT` environment variable workaround MUST NOT be enabled; e.g. the version reported for Big Sur should be 11.x and not 10.16. If the version cannot be estimated (e.g. because the native platform is not macOS), the fallback value MUST be set to `0`. The build string MUST be `0`.

The version MUST be overridable with the `CONDA_OVERRIDE_OSX` environment variable if set to a non-empty value that can be parsed as a version string. The environment variable MUST be ignored when the target platform is not `osx-*`.

> The macOS version can be obtained via:
>
> - Python's `platform.mac_ver()[0]`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will return the SYSTEM_VERSION_COMPAT version if the Python interpreter running the command was built against the 10.15 SDK or earlier.

See https://eclecticlight.co/2020/08/13/macos-version-numbering-isnt-so-simple/ for details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICR, if you start the Python interpreter with SYSTEM_VERSION_COMPAT=0 it returns the 11.x based version, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, running setting the environment variable will return a >=11 version.

My main concern here the "MUST NOT" language around the SYSTEM_VERSION_COMPAT workaround. I agree with the idea but this is not the case for the current version of conda (see conda/conda#13832). The example an this issue still reports a 10.16 version for __osx. Changing this to "SHOULD" would be reasonable, especially given that there are many releases of tools/packages that will report the compatible version that already exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could consider that a bug in conda and submit a fix for 25.1 (as we are doing for __win). I can survey public repodata for __osx usage in the wild if that helps inform this decision.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

> - `SYSTEM_VERSION_COMPAT=0 sw_vers -productVersion`

#### `__unix`

This virtual package MUST be present when the target platform is `linux-*`, `osx-*` or `freebsd-*`. The version and build string fields MUST be set to `0`.
jakirkham marked this conversation as resolved.
Show resolved Hide resolved

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than list the specific target platforms now and having to adjust this CEP if/when new platforms are adopted, I would suggest language like "when the target platform is sufficiently POSIX-y" and list the attributes necessary for that to be true (e.g., uses / for path delimiters, supports fork(3), etc.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the sentiment, but I'm going to need some help or references to compile that list 😬


The `CONDA_OVERRIDE_UNIX` environment variable MUST NOT have any effect.

#### `__win`

This virtual package MUST be present when the target platform is `win-*`. The version MUST be set to the first three numeric components of the Windows build version, formatted as `{major}.{minor}.{build}`. If the version cannot be estimated (e.g. because the target platform does not match the native platform), the tool MUST set the version to a default value of its choice.

The version MUST be overridable with the `CONDA_OVERRIDE_WIN` environment variable if set to a non-empty value that can be parsed as a version string. The environment variable MUST be ignored when the target platform is not `win-*`.

The build string MUST be `0`.

> The version string `{major}.{minor}.{build}` can be obtained from:
>
> - Python's `platform.win32_ver()`
> - CMD's `ver`
- Powershell's `[System.Environment]::OSVersion.Version`, `(Get-CimInstance Win32_OperatingSystem).version`
> - The command `wmic os get version`

## Potential future work

This CEP focuses on the standardization of existing virtual package implementations.

The following items are not considered here. Though would be open for discussion in future CEP work:

- Additional OSes, like `__freebsd` or `__netbsd`.
- Coarse grain architecture information, like `__x86_64` or `__arm64`, or, more generally, [`__arch`](https://github.com/conda/conda/issues/13420).
- More `libc` implementations, like `__musl`.

## References

* [Virtual packages implementation in `conda/conda` 24.11.1](https://github.com/conda/conda/tree/24.11.1/conda/plugins/virtual_packages)
* [Virtual packages implementation in `libmamba` 2.0.5](https://github.com/mamba-org/mamba/blob/libmamba-2.0.5/libmamba/src/core/virtual_packages.cpp)
* [Virtual packages implementation in `rattler` 0.28.8](https://github.com/conda/rattler/tree/rattler-v0.28.8/crates/rattler_virtual_packages/src)
* [ENH: make `__win` version usable for package metadata (conda/conda#14443)](https://github.com/conda/conda/issues/14443)
* [Drop `CONDA_OVERRIDE_WIN` environment variable (mamba-org/mamba#2815)](https://github.com/mamba-org/mamba/pull/2815)
* [`__arch` feature request](https://github.com/conda/conda/issues/13420)

## Copyright

All CEPs are explicitly [CC0 1.0 Universal](https://creativecommons.org/publicdomain/zero/1.0/).