Supporting microarchitecture-specific builds #59

chenghlee · 2023-07-15T16:49:11Z

In various cases, users would like to take advantage of various CPU-specific features or instructions for improved performance, e.g., AVX* or AES/crytography.

xref issues:

conda-forge archspec-enabled packages

cc: @ltalirz

ltalirz · 2023-07-15T17:04:17Z

Other relevant links:

x86_x64_v4 in archspec and translation to compiler flags
HPC packages from atomistic simulations field that could profit from this
- gromacs
- quantum espresso
- lammps
- siesta
- abinit
- cp2k
- openmm
- xtb
- pyscf
- probably a couple underlying libraries used by those
- ...

isuruf · 2023-07-15T17:58:07Z

With __archspec you can create a meta-package like x86_64_feature_level that depend on specific __archspec values and downstreams will depend on x86_64_feature_level.

I think a CEP is needed for how to handle virtual packages in a conda-build sense. Currently, virtual packages are one of the main things from a build machine that affect compilation.

For eg: let's say A was built with AVX512f and B needs A built with AVX512f. Then B can be compiled only on a build machine with AVX512f. We can workaround by setting CONDA_OVERRIDE_ARCHSPEC, but then, BUILD_PREFIX also gets that architecture and if there is a build tool relying on AVX512f, that build tool cannot be run on a machine without AVX512f.
Therefore CONDA_OVERRIDE_ARCHSPEC should affect only host and not build. (Even host is problematic when considering python etc). This is also an issue with cross-compilation.

ltalirz · 2023-07-15T19:42:04Z

Two questions/thoughts:

What is the minimum microarchitecture of your current x86_64 CI build hardware (not the one you choose to build for)?
Would it be acceptable to do cross-microarchitecture-compiles (e.g. compile x86_64_v4 on a runner that is just x86_64_v3), and let the feedstock maintainers who want this feature worry about making sure that they continue to compile any build tools with the lowest-common-denominator microarchitecture? It would unfortunately mean that one cannot run the tests for these builds, but it may still be better to standardize in some way rather than have people who want this feature run off into different directions.

isuruf · 2023-07-16T17:01:46Z

Ivybridge
Python blurs the line with build tools. For eg: numpy needs to be importable to get numpy.get_include() which means numpy needs to be built with the lowest-common-denominator architecture.

h-vetinari · 2023-07-16T19:44:02Z

Very interested in this for arrow, faiss, pillow, onednn, etc.

Numpy actually has a very elaborate dispatch mechanism (for most relevant functions) based on run-time detection of the available CPU features, so we likely wouldn't have to worry about numpy specifically - i.e. can keep building it for the lowest CPU feature level without losing much.

I guess for other feedstocks it should be based on an analysis whether the impact is worth the blow up in the build matrix, but it would be an amazing tool to have.

The one thing that this makes worse is overloading build string semantics (e.g. cpu vs. cuda, license family, openmp flavour, now arch spec, see conda/conda#11053), which is even more relevant due conda/conda#11612 still being open.

ltalirz · 2023-07-16T20:21:56Z

Numpy actually has a very elaborate dispatch mechanism (for most relevant functions) based on run-time detection of the available CPU features

Ah, thanks for pointing this out, I was not aware - so this is even before you hit backends like the MKL (which also do this), correct?

While adding this dispatch capability may not be achievable for some packages (due to amount of work/knowledge required), technologically it would be the best solution in most cases, and I suspect that maintainers of smaller packages may often not be aware of the possibilities here (I certainly am/was not). When documenting the feature we propose in this thread, it would probably be wise to also point maintainers to resources on methods for selecting the best code for the microarchitecture at runtime, so they can avoid having to create separate builds for each microarchitecture altogether (e.g. here is an article from the Guix blog on function-multi-versioning that I found interesting).

h-vetinari · 2023-07-16T20:33:51Z

I don't think the numpy dispatch mechanism is feasible for others to reimplement, it's a very elaborate piece of engineering by some domain experts (not sure how realistic it would be to library-ize it for wider use... 🤔).

so this is even before you hit backends like the MKL (which also do this), correct?

It cannot affect external calls like to LAPACK, it just works for numpy-internal functions that get precompiled for various CPU features and then selected at runtime based on availability.

So this won't help for optimising blas/LAPACK etc. I think that for such compute-heavy libraries, we should just build v2/v3/v4 by default (once the infrastructure is in place, of course).

My point about doing this on an as-needed (resp. as-justified) basis is that not every package will have double-digit performance gains for higher CPU feature levels, and so we (in conda-forge) should not blindly multiply our build matrix by a factor of 3 on every feedstock.

isuruf · 2023-07-17T10:54:34Z

If Linux and glibc is all you are targeting, rpath token expansion is the easiest way to have x86_64 feature level specific binaries. For eg: https://github.com/conda-forge/gmp-feedstock/blob/main/recipe/build.sh#L28-L45 produces power8 and power9 binaries. It can be extended to x86_64 feature levels as well.

tnabtaf · 2023-07-17T17:22:55Z

Longer term, if we want to support RISC-V then we will probably want to support extensions as well.

h-vetinari mentioned this issue Jul 31, 2023

cutlass v3.1.0 conda-forge/cutlass-feedstock#9

Merged

3 tasks

chenghlee mentioned this issue Dec 20, 2024

Write CEP about virtual packages #103

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting microarchitecture-specific builds #59

Supporting microarchitecture-specific builds #59

chenghlee commented Jul 15, 2023 •

edited

Loading

ltalirz commented Jul 15, 2023 •

edited

Loading

isuruf commented Jul 15, 2023 •

edited

Loading

ltalirz commented Jul 15, 2023 •

edited

Loading

isuruf commented Jul 16, 2023

h-vetinari commented Jul 16, 2023

ltalirz commented Jul 16, 2023 •

edited

Loading

h-vetinari commented Jul 16, 2023

isuruf commented Jul 17, 2023

tnabtaf commented Jul 17, 2023

Supporting microarchitecture-specific builds #59

Supporting microarchitecture-specific builds #59

Comments

chenghlee commented Jul 15, 2023 • edited Loading

ltalirz commented Jul 15, 2023 • edited Loading

isuruf commented Jul 15, 2023 • edited Loading

ltalirz commented Jul 15, 2023 • edited Loading

isuruf commented Jul 16, 2023

h-vetinari commented Jul 16, 2023

ltalirz commented Jul 16, 2023 • edited Loading

h-vetinari commented Jul 16, 2023

isuruf commented Jul 17, 2023

tnabtaf commented Jul 17, 2023

chenghlee commented Jul 15, 2023 •

edited

Loading

ltalirz commented Jul 15, 2023 •

edited

Loading

isuruf commented Jul 15, 2023 •

edited

Loading

ltalirz commented Jul 15, 2023 •

edited

Loading

ltalirz commented Jul 16, 2023 •

edited

Loading