Releases · nv-legate/cupynumeric

07 Dec 06:44

marcinz

v24.11.02

5371ab3

v24.11.02 Latest

Latest

This is a patch release of cuPyNumeric.

Linux x86 and ARM conda packages are available at https://anaconda.org/legate/cupynumeric.

Documentation for this release can be found at https://docs.nvidia.com/cupynumeric/24.11/.

Packaging Changes

Update for Legate v24.11.01

Assets 2

07 Dec 06:42

marcinz

v24.11.01

9627cb8

v24.11.01

This is a patch release of cuPyNumeric.

Linux x86 and ARM conda packages are available at https://anaconda.org/legate/cupynumeric.

Documentation for this release can be found at https://docs.nvidia.com/cupynumeric/24.11/.

Bug Fixes

Explicit fallback to __array__() on __buffer__

Assets 2

17 Nov 00:51

manopapad

v24.11.00

b198f33

v24.11.00

This is a beta release of cuPyNumeric.

Linux x86 and ARM conda packages are available at https://anaconda.org/legate/cupynumeric.

Documentation for this release can be found at https://docs.nvidia.com/cupynumeric/24.11/.

New features

Improved API coverage

Implement np.unravel_index
Implement np.angle
Implement np.median
Implement np.ix_
Implement np.meshgrid
Implement np.expand_dims
Implement np.rot90
Implement np.round
Implement np.fft.fftshift and np.fft.ifftshift
Implement np.roll
Support full_matrices parameter of np.linalg.svd

Memory management enhancements

Memory efficient implementation of matrix multiplication - this implementation batches over the reduction dimension, achieving constant memory overhead regardless of array sizes.
Memory efficiency for stencil computation - add np.ndarray.stencil_hint method, that instructs cuPyNumeric to pre-allocate the necessary space for ghost elements when an array is to be used in a stencil computation, reducing intermediate memory use.
Memory allocation report - report the object-memory mapping when a computation runs out of memory, to help users debug and optimize memory usage.

Enhanced infrastructure support

GH200 Grace Hopper Superchip support - allows users to leverage GH200-based cloud instances and supercomputers.
GASNet support - support GASNet as an alternative networking backend to UCX, using a GASNet wrapper, MPI wrapper, and custom build utilities.
Initial HDF5 support - distributed read/write of HDF5 files using a POSIX backend.
Automatic resource configuration at run time - automatically discover and use all the available compute resources including CPU, GPU, system memory, and framebuffer memory.
More enhancements from Legate 24.11

Other

Re-implement the RNG module on top of the C++ STL random library, removing the need to have cuRand in CPU-only installations.

Known Issues

cuPyNumeric will emit a false-positive warning like the following:

RuntimeWarning: cuPyNumeric has not implemented numpy.ndarray.__buffer__ and is falling back to canonical NumPy. You may notice significantly decreased performance for this function call.

in cases such as when an arithmetic operation is performed on a scalar array, e.g. cupynumeric.array(42) * 2. There is no actual performance degradation occurring in this case. We are working on a patch that will suppress this warning.

Assets 2

11 Sep 20:36

manopapad

v24.06.01

427da00

v24.06.01

This is a patch release, and includes the following fixes:

Fix for nv-legate/legate#947
Fix package dependencies (cuda and openblas)

x86 conda packages with multi-node support (based on UCX) are available at https://anaconda.org/legate/cunumeric.

Documentation for this release can be found at https://docs.nvidia.com/cunumeric/24.06/.

Assets 2

03 Jul 22:35

manopapad

v24.06.00

510e24a

v24.06.00

This release ports cuNumeric to the C++-based Legate-Core. Additionally, it includes the following new features:

np.linalg.qr, np.linalg.svd (single-GPU support only)
"where" argument for unary operations
np.select
np.flipup, np.fliplr
np.cov
np.load (initial, unoptimized implementation)
np.average
np.logical_and/or.reduce
np.digitize
np.diff
np.linalg.cholesky, np.linalg.solve (multi-GPU support, based on cuSolverMp -- not included in conda packages, requires a manual build)
C++-based ndarray class (experimental support)

x86 conda packages with multi-node support (based on UCX) are available at https://anaconda.org/legate/cunumeric.

Documentation for this release can be found at https://docs.nvidia.com/cunumeric/24.06/.

Known issues

Including the nvidia conda channel in an environment with cunumeric may end up pulling cutensor 2.0, even though the cunumeric packages explicitly request cutensor 1.7. This can cause error messages like this:

OSError: libcutensor.so.1: cannot open shared object file: No such file or directory

This is not an issue with cuNumeric, but with incorrect constraints on the cutensor packages on the nvidia channel. Please avoid including the nvidia conda channel in any conda environment including cunumeric.

Assets 2

21 Nov 01:47

marcinz

v23.11.00

d91f17c

v23.11.00

This release contains performance improvements to the variance operation, and a multi-dimensional Cholesky implementation.

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

What's Changed

🚀 New Features

Added variance as a unary reduction by @jjwilke in #593
Add batched cholesky implementation and tests by @jjwilke in #1029

🐛 Bug Fixes

Replacing set with OrderedSet to avoid control-replication violations by @ipdemes in #1054
Inline boolean operators in NumPy are bitwise, not logical by @manopapad in #1057
Fix #1065 ("where" fails with IndexError) by @manopapad in #1067
Fixes #1069, #1070 (minor einsum bugs) by @manopapad in #1072

📖 Documentation

Suggest using mamba over conda by @manopapad in #1068

Full Changelog: v23.09.00...v23.11.00

Contributors

jjwilke, manopapad, and ipdemes

Assets 2

03 Oct 15:23

marcinz

v23.09.00

e66a063

v23.09.00

This release adds support for the quantile API, and includes some performance and documentation improvements (notably a "Best Practices" guide).

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

What's Changed

🚀 New Features

Quantile Implementation by @aschaffer in #664

🛠️ Improvements

Add missing openmp variants to BitGenerator and UniqueReduce by @rohany in #1010
Histogram refactor by @aschaffer in #1003

📖 Documentation

Add best practices info to sphinx docs by @bryevdv in #1048

🐛 Bug Fixes

Missing alignment on histogram call by @manopapad in #999
Fix for control replication violation in test by @ipdemes in #1005
Fix build instructions link by @bryevdv in #1014
Add back None as an accepted value for axis on some type sigs by @manopapad in #1017
If a scalar ufunc arg is cn.ndarray use its type directly by @manopapad in #1011
Skip the docstrings for functions pulled from cloned modules by @manopapad in #1024
Fix random test failures in CPU-only runs by @manopapad in #1025
Don't cast histogram to int64 when density=True by @manopapad in #1042
Explicitly cast result of shift binary operators by @manopapad in #1046
Remove use of deprecated np.find_common_type by @manopapad in #1045

New Contributors

@ajschmidt8 made their first contribution in #1035

Full Changelog: v23.07.00...v23.09.00

Contributors

manopapad, bryevdv, and 4 other contributors

Assets 2

25 Jul 04:51

marcinz

v23.07.00

d413db2

v23.07.00

This release adds support for histogram, broadcast* and various nan* APIs. It also includes performance improvements to the FFT functions and cleanups in ufunc support.

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

What's Changed

🚀 New Features

Implement broadcast routines by @bryevdv in #759
Sanitize unary reductions that have NaNs by @shriram-jagan in #925
Histogram Functionality by @aschaffer in #983

🛠️ Improvements

Add ufunc methods by @bryevdv in #834
Support of the shape argument in empty_like() & Co. by @madsbk in #845
Add support for Python 3.11 (#830) by @marcinz in #837
Ensure ufunc/function dispatching is narrow by @seberg in #977
Fft improvements by @mfoerste4 in #732

📖 Documentation

Note new minimum CUDA requirements for conda packages by @manopapad in #875

🐛 Bug Fixes

Fix bugs in concatenate and stack APIs. by @robinwnv in #844
Fixes #858 by @manopapad in #859
Fix concatenate and *stack APIs to support scalars(#818, #839) by @robinwnv in #866
Avoid following compiler symlinks by @manopapad in #880
Fix for some binary operators on float16 by @magnatelee in #889
WAR for TBLIS compiler detection while upstream PR is pending by @manopapad in #890
Also build CPU-only packages for haswell (#869) by @marcinz in #882
Fix array API(#885). by @robinwnv in #910
Fix unit tests by @magnatelee in #920
Fix an incorrect type by @marcinz in #931
Use correct type, to avoid int narrowing by @manopapad in #941
Fix cunumeric.arange issues by @yimoj in #940
Use the right type for scalar arguments by @magnatelee in #942
Fall back to NumPy eagerly on RandomState methods by @manopapad in #959
Fix bugs in random integer functions by @manopapad in #966
Resolve numpy 1.25 issues by @bryevdv in #973
Set lib_dir explicitly to lib/, even on RHEL by @manopapad in #971
fixing putmask logic for scalar inputs by @ipdemes in #980
fixing cuda error by @ipdemes in #978
Change arg to LLONG_MIN to make it consistent with python. by @shriram-jagan in #986
Missing alignment on histogram call by @manopapad in #1000

New Contributors

@madsbk made their first contribution in #845
@sandeepd-nv made their first contribution in #899
@seberg made their first contribution in #977
@shriram-jagan made their first contribution in #988
@aschaffer made their first contribution in #983

Full Changelog: v23.03.00...v23.07.00

Contributors

seberg, manopapad, and 11 other contributors

Assets 2

15 Mar 20:02

marcinz

v23.03.00

9ac887b

v23.03.00

This is the beta release of cuNumeric.

This release is focused on bug fixes, code clean-up and documentation updates, in preparation for entering beta status.

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

What's Changed

🐛 Bug Fixes

Do reductions properly in tensor contraction tasks by @magnatelee in #803
Seed the NumPy RNG at the start of every test by @manopapad in #792
Fix handling of negative axis in np.repeat by @manopapad in #821
Fix for #720 (by @lightsighter) by @manopapad in #721
Ensure unary_func seeding is deterministic across processes by @manopapad in #825

🛠️ Improvements

Update the architectures built in conda package by @marcinz in #770
Use thrust::cuda::par_nosync if available by @magnatelee in #780
Preemptively convert to np.ndarray on NumPy fallback by @manopapad in #802
Removing all Legion references from the code by @magnatelee in #811
Remove exception throwing from RNG code by @manopapad in #815
Pin legate to a specific commit by @trxcllnt in #824
Add support for Python 3.11 by @m3vaz in #830

📖 Documentation

[WIP] Docs refresh by @bryevdv in #805

Full Changelog: v23.01.00...v23.03.00

Contributors

trxcllnt, manopapad, and 5 other contributors

Assets 2

31 Jan 03:38

marcinz

v23.01.00

2455b55

v23.01.00

This release introduces support for the put and putmask operations, adds an optimized implementation for the common case of advanced indexing using a single (possibly broadcasted) boolean array, includes more information in the tags of unary/binary operations on profiles (for easier cross-referencing with the source script), and adds some small improvements to OpenMP execution.

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

What's Changed

🐛 Bug Fixes

Make the code compile with bounds checks by @magnatelee in #648
MatVec & MatVecMul use reduction stores, not outputs by @manopapad in #646
Set default generator based on whether ninja is available by @jjwilke in #602
Allow args to be passed by position and name in auto_convert by @manopapad in #640
Force positive values for log and sqrt tests by @jjwilke in #580
Eliminate empty kernel launch in cunumeric.unique by @magnatelee in #675
Make install.py reconfigure editable installs when build type changes by @trxcllnt in #670
Fix for #684 by @magnatelee in #686
Follow up on PR #671 by @ipdemes in #677
More argument checks for bincount by @magnatelee in #711
Fix a typo in unique.cu indexing by @manopapad in #713
guard all2all from empty transfer by @mfoerste4 in #727
src/cunumeric/item: add openmp variants for write/read tasks by @rohany in #740
Fix CI failures due to numpy 1.24 upgrade by @manopapad in #745
Fix timing for CuPy tests by @manopapad in #747
Don't turn on cuNumeric debug checks on debug-rel builds by @manopapad in #753
Move pip uninstall step before CMake is run instead of after. by @trxcllnt in #760
Force conda version of cutensor by @marcinz in #765
handle numpy 'builtins' properly for coverage by @bryevdv in #766

🚀 New Features

Implementing PUT routine by @ipdemes in #582
Implementing Putmask by @ipdemes in #667

🛠️ Improvements

Move test driver code to legate.core by @bryevdv in #627
Remove --install-dir option by @bryevdv in #656
Updates for new script-based conda env generation by @manopapad in #651
Log operator names of unary and binary operations using annotations by @magnatelee in #679
Regenerate install_info.py on every build by @trxcllnt in #705
Fixes for buffer allocations by @magnatelee in #706
Clean up the basic build instructions by @manopapad in #741
Refactor benchmarks by @manopapad in #567
Improving performance for some special cases of advanced indexing by @ipdemes in #731
Pass CMAKE_GENERATOR to scikit-build by @trxcllnt in #750
Change the default CPU architecture to haswell by @marcinz in #762

Full Changelog: v22.10.00...v23.01.00

Contributors

jjwilke, trxcllnt, and 7 other contributors

Assets 2

Releases: nv-legate/cupynumeric

v24.11.02

Packaging Changes

v24.11.01

Bug Fixes

v24.11.00

New features

Improved API coverage

Memory management enhancements

Enhanced infrastructure support

Other

Known Issues

v24.06.01

v24.06.00

Known issues

v23.11.00

What's Changed

🚀 New Features

🐛 Bug Fixes

📖 Documentation

Contributors

v23.09.00

What's Changed

🚀 New Features

🛠️ Improvements

📖 Documentation

🐛 Bug Fixes

New Contributors

Contributors

v23.07.00

What's Changed

🚀 New Features

🛠️ Improvements

📖 Documentation

🐛 Bug Fixes

New Contributors

Contributors

v23.03.00

What's Changed

🐛 Bug Fixes

🛠️ Improvements

📖 Documentation

Contributors

v23.01.00

What's Changed

🐛 Bug Fixes

🚀 New Features

🛠️ Improvements

Contributors