You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched open requests and couldn't find a duplicate
What is the idea?
Conda could build packages in parallel. After an analysis of the DAG of package dependencies, leaf nodes and their hierarchy could be built in parallel. Most of my system is idle during installation of conda packages.
Why is this needed?
tests for rapids[1], which include installation of cudatools, dask, pandas and other ML tools take a very long time and spend a good portion of the workflow blocking on a single threaded application.
The work should be broken down into a DAG and delegated to worker threads à la make -j$(nproc)
Additional Context
I appreciate the work done on parallelizing the package downloads. I've included export CONDA_FETCH_THREADS="$(nproc)" to accelerate that portion of the workflow.
The text was updated successfully, but these errors were encountered:
For the record, here is the command that's taking a while to run. I am running this on a rocky8 base image. I can gather metrics for the debian and ubuntu variants as well if that would help.
It was using more than the 15G of memory available to the n1-standard-4 machine type, and during some portions of the installation, CPU load was near 100% with the 4 processors, so I've increased the machine type to n1-standard-16.
This improves the performance of the GPU driver build script, which uses make -j$(nproc) to parallelize the nvidia kernel driver compilation process. With -j1, the build takes much more time than with -j16. I would hope that the same would be true of the conda build process, but it seems to be single-threaded.
Checklist
What is the idea?
Conda could build packages in parallel. After an analysis of the DAG of package dependencies, leaf nodes and their hierarchy could be built in parallel. Most of my system is idle during installation of conda packages.
Why is this needed?
tests for rapids[1], which include installation of cudatools, dask, pandas and other ML tools take a very long time and spend a good portion of the workflow blocking on a single threaded application.
[1] GoogleCloudDataproc/initialization-actions#1219
What should happen?
The work should be broken down into a DAG and delegated to worker threads à la
make -j$(nproc)
Additional Context
I appreciate the work done on parallelizing the package downloads. I've included
export CONDA_FETCH_THREADS="$(nproc)"
to accelerate that portion of the workflow.The text was updated successfully, but these errors were encountered: