Skip to content

Releases: nv-legate/cupynumeric

v24.11.02

07 Dec 06:44
5371ab3
Compare
Choose a tag to compare

This is a patch release of cuPyNumeric.

Linux x86 and ARM conda packages are available at https://anaconda.org/legate/cupynumeric.

Documentation for this release can be found at https://docs.nvidia.com/cupynumeric/24.11/.

Packaging Changes

  • Update for Legate v24.11.01

v24.11.01

07 Dec 06:42
9627cb8
Compare
Choose a tag to compare

This is a patch release of cuPyNumeric.

Linux x86 and ARM conda packages are available at https://anaconda.org/legate/cupynumeric.

Documentation for this release can be found at https://docs.nvidia.com/cupynumeric/24.11/.

Bug Fixes

  • Explicit fallback to __array__() on __buffer__

v24.11.00

17 Nov 00:51
b198f33
Compare
Choose a tag to compare

This is a beta release of cuPyNumeric.

Linux x86 and ARM conda packages are available at https://anaconda.org/legate/cupynumeric.

Documentation for this release can be found at https://docs.nvidia.com/cupynumeric/24.11/.

New features

Improved API coverage

  • Implement np.unravel_index
  • Implement np.angle
  • Implement np.median
  • Implement np.ix_
  • Implement np.meshgrid
  • Implement np.expand_dims
  • Implement np.rot90
  • Implement np.round
  • Implement np.fft.fftshift and np.fft.ifftshift
  • Implement np.roll
  • Support full_matrices parameter of np.linalg.svd

Memory management enhancements

  • Memory efficient implementation of matrix multiplication - this implementation batches over the reduction dimension, achieving constant memory overhead regardless of array sizes.
  • Memory efficiency for stencil computation - add np.ndarray.stencil_hint method, that instructs cuPyNumeric to pre-allocate the necessary space for ghost elements when an array is to be used in a stencil computation, reducing intermediate memory use.
  • Memory allocation report - report the object-memory mapping when a computation runs out of memory, to help users debug and optimize memory usage.

Enhanced infrastructure support

  • GH200 Grace Hopper Superchip support - allows users to leverage GH200-based cloud instances and supercomputers.
  • GASNet support - support GASNet as an alternative networking backend to UCX, using a GASNet wrapper, MPI wrapper, and custom build utilities.
  • Initial HDF5 support - distributed read/write of HDF5 files using a POSIX backend.
  • Automatic resource configuration at run time - automatically discover and use all the available compute resources including CPU, GPU, system memory, and framebuffer memory.
  • More enhancements from Legate 24.11

Other

  • Re-implement the RNG module on top of the C++ STL random library, removing the need to have cuRand in CPU-only installations.

Known Issues

cuPyNumeric will emit a false-positive warning like the following:

RuntimeWarning: cuPyNumeric has not implemented numpy.ndarray.__buffer__ and is falling back to canonical NumPy. You may notice significantly decreased performance for this function call.

in cases such as when an arithmetic operation is performed on a scalar array, e.g. cupynumeric.array(42) * 2. There is no actual performance degradation occurring in this case. We are working on a patch that will suppress this warning.

v24.06.01

11 Sep 20:36
427da00
Compare
Choose a tag to compare

This is a patch release, and includes the following fixes:

x86 conda packages with multi-node support (based on UCX) are available at https://anaconda.org/legate/cunumeric.

Documentation for this release can be found at https://docs.nvidia.com/cunumeric/24.06/.

v24.06.00

03 Jul 22:35
510e24a
Compare
Choose a tag to compare

This release ports cuNumeric to the C++-based Legate-Core. Additionally, it includes the following new features:

  • np.linalg.qr, np.linalg.svd (single-GPU support only)
  • "where" argument for unary operations
  • np.select
  • np.flipup, np.fliplr
  • np.cov
  • np.load (initial, unoptimized implementation)
  • np.average
  • np.logical_and/or.reduce
  • np.digitize
  • np.diff
  • np.linalg.cholesky, np.linalg.solve (multi-GPU support, based on cuSolverMp -- not included in conda packages, requires a manual build)
  • C++-based ndarray class (experimental support)

x86 conda packages with multi-node support (based on UCX) are available at https://anaconda.org/legate/cunumeric.

Documentation for this release can be found at https://docs.nvidia.com/cunumeric/24.06/.

Known issues

Including the nvidia conda channel in an environment with cunumeric may end up pulling cutensor 2.0, even though the cunumeric packages explicitly request cutensor 1.7. This can cause error messages like this:

OSError: libcutensor.so.1: cannot open shared object file: No such file or directory

This is not an issue with cuNumeric, but with incorrect constraints on the cutensor packages on the nvidia channel. Please avoid including the nvidia conda channel in any conda environment including cunumeric.

v23.11.00

21 Nov 01:47
d91f17c
Compare
Choose a tag to compare

This release contains performance improvements to the variance operation, and a multi-dimensional Cholesky implementation.

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

What's Changed

🚀 New Features

🐛 Bug Fixes

📖 Documentation

Full Changelog: v23.09.00...v23.11.00

v23.09.00

03 Oct 15:23
e66a063
Compare
Choose a tag to compare

This release adds support for the quantile API, and includes some performance and documentation improvements (notably a "Best Practices" guide).

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

What's Changed

🚀 New Features

🛠️ Improvements

📖 Documentation

🐛 Bug Fixes

New Contributors

Full Changelog: v23.07.00...v23.09.00

v23.07.00

25 Jul 04:51
d413db2
Compare
Choose a tag to compare

This release adds support for histogram, broadcast* and various nan* APIs. It also includes performance improvements to the FFT functions and cleanups in ufunc support.

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

What's Changed

🚀 New Features

🛠️ Improvements

📖 Documentation

  • Note new minimum CUDA requirements for conda packages by @manopapad in #875

🐛 Bug Fixes

New Contributors

Full Changelog: v23.03.00...v23.07.00

v23.03.00

15 Mar 20:02
9ac887b
Compare
Choose a tag to compare

This is the beta release of cuNumeric.

This release is focused on bug fixes, code clean-up and documentation updates, in preparation for entering beta status.

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

What's Changed

🐛 Bug Fixes

🛠️ Improvements

📖 Documentation

Full Changelog: v23.01.00...v23.03.00

v23.01.00

31 Jan 03:38
2455b55
Compare
Choose a tag to compare

This release introduces support for the put and putmask operations, adds an optimized implementation for the common case of advanced indexing using a single (possibly broadcasted) boolean array, includes more information in the tags of unary/binary operations on profiles (for easier cross-referencing with the source script), and adds some small improvements to OpenMP execution.

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

What's Changed

🐛 Bug Fixes

🚀 New Features

🛠️ Improvements

Full Changelog: v22.10.00...v23.01.00