Releases · nv-legate/cupynumeric

13 Oct 23:53

marcinz

v22.10.00

81ad156

v22.10.00

The biggest change in Release 22.10 is a new build infrastructure using CMake and scikit-build. The new build system brings several benefits including robust build dependency tracking and compliance with Python site-packages. This release includes several new search and indexing operators, fixes for several performance and correctness bugs, and provenance tracking for top-level and ndarray routines in execution profiles.

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

What's Changed

🚀 New Features

• Argwhere and flatnonzero by @mfoerste4 in #525

added extract and place via advanced indexing by @mfoerste4 in #536
Fill diagonal by @ipdemes in #473
Single processor implementation for linalg.solve by @magnatelee in #568

🛠️ Improvements

adding support for array shape () passed as an index argument in advanced indexing by @ipdemes in #486
Refactor test driver for cpu/gpu sharding by @bryevdv in #451
Collate test output to allow workers > 1 with verbose output by @bryevdv in #507
Ensure test.py --use flag fully overrides USE_* envvars by @manopapad in #524
Enhance two integration tests by @robinw0928 in #511
Add typing to array.py by @bryevdv in #478
Update test runner for osx by @bryevdv in #529
Don't blindly trust user-supplied bincount.minlength by @manopapad in #523
Make reduced-precision cuBLAS mode opt-in by @manopapad in #519
Fix reciprocal tests for zero values and improve test value customization (#467) by @marcinz in #537
Refactor test runner to support more pinning options by @bryevdv in #535
Remove dead code ian bincount by @magnatelee in #546
Make the validation condition for random distributions lenient by @magnatelee in #550
src/cunumeric: handle high number of bins in GPU bincount by @rohany in #526
Construct NumPy arrays correctly from 0D deferred arrays backed by region fields by @magnatelee in #551
Collect test failure details at the end by @bryevdv in #556
Simplify some thunk conversion helpers by @manopapad in #553
Fix a compiler warning by @magnatelee in #555
Add option to disable CPU pinning in tests by @bryevdv in #558
Use the new mapper registration to enable detailed mapper logging by @magnatelee in #570
src/cunumeric/search: make nonzero not always allocate SYS_MEM buffers by @rohany in #572
add negative test case in test_array_split.py by @xialu00 in #545
add some test cases for test_arg_reduce.py by @xialu00 in #575
Testcase-add test cases for test_flip and test_indices by @xialu00 in #579
Refactor scalar reductions to use common execution policy by @jjwilke in #573
Sanitize k for the eye operator by @magnatelee in #586
Add CMake build for C++ and scikit-build infrastructure for Python package installation by @jjwilke in #514
Enhance test_block.py and test_eye.py by @robinw0928 in #578
Testcase add test cases for test_fill.py and test_ndim.py by @xialu00 in #588
Remove run dependency on curand by @marcinz in #520
Use Legion Fills when possible by @manopapad in #604
Support building with GASNet-Ex and MPI backends by @manopapad in #610
Provenance tracking for cuNumeric operators by @magnatelee in #596
Fix tests utils to make --directory work correctly. by @robinw0928 in #592
Fix a compiler warning by @magnatelee in #594
Enhance test_diag_indices.py and test_flatten.py. by @robinw0928 in #609
cuNumeric doesn't need nested provenance tracking by @magnatelee in #617
Add RuntimeError exception to legate.time by @robinw0928 in #618
Stop instantiating min and max reduction ops for complex types by @magnatelee in #621
Mark temporary conversion outputs as linear for eager storage recycling by @magnatelee in #608
Make the negative test on fill robust across Python versions by @magnatelee in #619
Enhance mask_indices and move_axis by @robinw0928 in #622
src/cunumeric/matrix: stop including coll.h in solve_template.inl by @rohany in #620

🐛 Bug Fixes

Fix performance bugs in scalar reductions by @magnatelee in #509
Don't use internal LAPACK function names by @manopapad in #522
Bug fixes for advanced indexing by @magnatelee in #532
Handle the case where LAPACK_*potrf is a macro, not a function by @manopapad in #527
fix mypy issue w/ np methods by @bryevdv in #542
Fix buggy complex-to-bool conversions and add correctness tests for astype by @magnatelee in #549
fixing advanced indexing operation for empty arrays by @ipdemes in #504
Do not link curand by @marcinz in #541
Fixing issues with advanced_indexing_kernel by @ipdemes in #557
fixing another corner case for advanced indexing by @ipdemes in #554
Fix OSX test shard generation by @bryevdv in #563
fix error print in test_unary_ufunc by @jjwilke in #566
Add NAN handling to convert() needed for some prefix routines with integer outputs. by @rkarim2 in #502
Fixing logic for slicing by @ipdemes in #574
Fix linalg.solve when inputs are scalars by @magnatelee in #585
Allow casting in cn.dot, to match numpy's behavior by @manopapad in #598
Add linalg.solve to the cmake build by @magnatelee in #603
Invoke eye with read-write privilege, not write-discard by @manopapad in #616
Fix a bug in scalar reduction launching kernels with empty domains by @magnatelee in #606

📖 Documentation

Added note to prefix documentation for corner cases where cunumeric results can diverge from numpy by @rkarim2 in #528
updating documentation by @ipdemes in #614
Add missing docs symlink by @bryevdv in #635

Contributors

jjwilke, manopapad, and 9 other contributors

Assets 2

09 Aug 03:38

marcinz

v22.08.00

ece6585

v22.08.00

Release 22.08.00 features a variety of random distribution implementations (backed by cuRAND), distributed prefix scan operators, and a complete implementation of sorting for multi-node multi-CPU execution. This release also includes several quality-of-life changes and bug fixes, including type annotations for all but one Python module, improvements to the parallel test driver, fixes for several operators when inputs are empty, and proper handling of ndarrays passed as array sizes or indices.

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

New Features

Adding support for ND output regions in Advanced Indexing task by @ipdemes in #370
added support for 'searchsorted' by @mfoerste4 in #414
np.packbits and np.unpackbits by @magnatelee in #427
Implementation of atleast_{1,2,3}d by @sbak5 in #404
Implementing cunumeric.random.BitGenerator by @fduguet-nv in #254
Adding support for some simple _indices routines by @ipdemes in #417
adding mask_indices routine by @ipdemes in #426
Random advanced distributions by @fduguet-nv in #470
Distributed nd sort for cpu/omp by @mfoerste4 in #437
Initial implementation of scan routines. by @rkarim2 in #425
Adding support for take_along_axis and put_along_axis by @ipdemes in #436
cunumeric.ndim by @magnatelee in #495
Add support for curand conda package build (cherry pick #510) by @marcinz in #512

Improvements

Don't run the resolution logic if the arrays have the same dtype by @magnatelee in #389
Set cuda virtual package as hard run requirement for gpu conda package by @m3vaz in #398
First pass mypy typing by @bryevdv in #387
Generalize Dict to Mapping for newer versions of mypy by @jjwilke in #405
Add support for using cupy in sort.py by @robinw0928 in #395
Refactor test.py by @bryevdv in #378
Use Numpy axis normalizations where possible by @bryevdv in #419
More mypy by @bryevdv in #413
adding bounds check for advanced indexing by @ipdemes in #397
Report Elapsed Time in cholesky's output by @SeyedMir in #423
Support -vv for more verbose test output by @bryevdv in #432
Add typing to runtime.py by @bryevdv in #428
Update compress/take tests for pytest by @bryevdv in #435
Project down to a 1D store for the scalar reduction output by @magnatelee in #455
Fallback to self = np.ndarray when necessary by @bryevdv in #431
Add types to thunk modules by @bryevdv in #438
allclose detail + misc tests improvements by @bryevdv in #457
cunumeric.random - Adding Module-scoped functions by @fduguet-nv in #481
Activate the NumPy fallback for cunumeric.random in CPU build by @magnatelee in #485
Legacy generators for cpu build by @magnatelee in #487
Allow CPU build to optionally use cuRAND by @magnatelee in #498
Sanitize shapes in ndarray's constructor by @magnatelee in #496
src/cunumeric/sort: stop using std::{inclusive, exclusive}_scan by @rohany in #499
Update conda requirements by @manopapad in #383
Handle dtype/casting/out properly in contractions by @manopapad in #402
Missing / overzealous check_eager_args calls by @manopapad in #465
Strengthen some types by @manopapad in #468

Bug Fixes

Add missing includes to aid intellisense providers by @trxcllnt in #382
Proper exception handling for cholesky by @magnatelee in #391
Fixes for building with setup.py outside conda, primarily Mac by @jjwilke in #394
Use the right API to check if the store is unbound by @magnatelee in #399
Fix nargs for report:dump-csv by @bryevdv in #400
Handle empty outputs correctly in advanced indexing task by @magnatelee in #396
Fall back to NumPy in array_function and array_ufunc by @magnatelee in #424
Fix for legate data interface by @magnatelee in #429
Fix test_floating.py test to call sys.exit by @marcinz in #433
Make missing pynvml an error for GPU tests by @bryevdv in #441
Make the NumPy fallback work correctly in randint by @magnatelee in #450
Squeeze fix by @magnatelee in #448
Correctly prune out empty tasks in binary reduction by @magnatelee in #453
Minor fix for indexing routines by @magnatelee in #452
Make DeferredArray.reshape always return a deferred array by @magnatelee in #454
Re-freezing conda compiler versions (#415) by @m3vaz in #449
Fix for floating point predicates by @magnatelee in #466
markdown version fix by @ipdemes in #459
Fixup typing regressions by @bryevdv in #471
Remove ill-defined advanced indexing test case by @magnatelee in #484
Handle empty inputs correctly in local scan tasks by @magnatelee in #491
Handle an unknown in a tuple correctly in reshape by @magnatelee in #490
fix mismatched size_t/uint64_t types by @jjwilke in #475
Allow scalar cunumeric ndarrays as array indices by @manopapad in #479

Documentation

adding new version for documentations by @ipdemes in #447
Updates to api_compare.py by @bryevdv in #456
Be stricter applying CuWrapperMetadata by @bryevdv in #463
Add custom nitpicky ref checks for cunumeric APIs by @bryevdv in #462
Docs coverage check by @bryevdv in #469
Fix the API reference for random functions and scan operators by @magnatelee in #497

New Contributors

@jjwilke made their first contribution in #394
@SeyedMir made their first contribution in #423
@fduguet-nv made their first contribution in #254
@rkarim2 made their first contribution in #425
@rohany made their first contribution in #499

Full Changelog: v22.05.02...v22.08.00

Contributors

jjwilke, trxcllnt, and 13 other contributors

Assets 2

21 Jun 10:52

marcinz

v22.05.02

8b163e6

v22.05.02

This hotfix release fixes issues in conda recipes.

What's Changed

Cherry pick: Update conda requirements (#383) by @marcinz in #406
Cherry pick: Set cuda virtual package as hard run requirement for conda gpu package (#398) by @marcinz in #407
Cherry pick: Fix nargs for report:dump-csv (#400) by @marcinz in #408
Re-freezing conda compiler versions by @m3vaz in #415

Full Changelog: v22.05.01...v22.05.02

Contributors

marcinz and m3vaz

Assets 2

16 Jun 20:44

marcinz

v22.05.01

7fcbf60

v22.05.01

This hotfix release updates the conda build recipe to make the cuNumeric package depend on the right version of NumPy and also fixes a bug in the command-line argument parser.

Full Changelog: v22.05.00...v22.05.01

Assets 2

07 Jun 03:36

marcinz

v22.05.00

0a642e8

v22.05.00

Release 22.05 features complete support for advanced indexing and related indexing routines (compress and take), a multi-node multi-GPU sorting implementation for multi-dimensional ndarrays, window functions, several matrix/tensor operations (trace, matrix_power, multi_dot, and einsum_path) and primitive support for FFT on a single GPU using cuFFT.

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

New Features

thrust allocator for sort by @mfoerste4 in #228
implementation of np.block w/ a test by @sbak5 in #213
Window functions by @magnatelee in #283
Advanced indexing by @ipdemes in #235
First implementation of single-GPU FFT using cuFFT by @mferreravila in #238
Use the stream pool in Legate core by @magnatelee in #295
Add partition api and utilize sort backend by @mfoerste4 in #287
implementing TRACE operation by @ipdemes in #263
adding support for negative indices in advanced indexing by @ipdemes in #322
Add cpu-only packages to the conda variants by @m3vaz in #330
Bump minpy to 3.8 (conda env and recipe) by @bryevdv in #332
Remaning ufuncs by @magnatelee in #315
Logic functions by @magnatelee in #347
Slicing-based np.block implementation by @sbak5 in #306
Implement matrix_power by @manopapad in #360
Distributed N-dimensional sort by @mfoerste4 in #316
Implement einsum_path by @manopapad in #361
adding diag_indices and diag_indices_from routines by @ipdemes in #367
Implement moveaxis by @manopapad in #364
Implement __array_function __and array_ufunc by @manopapad in #353
Implement more norm cases by @manopapad in #366
Implement multi_dot by @manopapad in #358
Adding support for "indices" routine by @ipdemes in #368
Support axis=None and keepdims=True/False in argmin and argmax by @trxcllnt in #346

Improvements

Move the ufunc module (ported to branch-22.05) by @magnatelee in #242
Use ufuncs in special methods by @magnatelee in #247
Initial unit tests by @bryevdv in #229
Revise type coercion by @magnatelee in #264
adding 'only' option to the tests.py by @ipdemes in #248
Updates for using the new unbound store API by @magnatelee in #265
Don't run the resolution logic if the arrays have the same dtype (ported to 22.05) by @magnatelee in #390
Use find_packages for installation by @magnatelee in #269
Some misc tests and types by @bryevdv in #268
Forward-port #257 by @manopapad in #273
Split up sort.cu for parallel compilation by @magnatelee in #277
Debugging checks by @magnatelee in #281
Update example programs by @magnatelee in #289
Bump up NumPy version by @magnatelee in #291
Don't use constexpr for window functions by @magnatelee in #294
Better error message on unsupported complex reductions by @manopapad in #300
handle coverage wrapping uniformly including ufuncs by @bryevdv in #272
Architecture-agnostic check for int128 by @manopapad in #293
Unit test fixups by @bryevdv in #303
reduce testcases for partition test by @mfoerste4 in #304
Adding conda build recipe files by @marcinz in #274
Use pytest for test running by @bryevdv in #297
Add unit tests to test.py by @marcinz in #305
Change _cunumeric_implemented into a dataclass by @manopapad in #318
Pass reporting explicity to coverage decorators by @bryevdv in #333
FFT refactoring by @magnatelee in #310
Declare ufunc formatter to be safe for parallel read by @magnatelee in #335
Force installation of Lapack in OpenBLAS build by @marcinz in #266
Mark no out-of-range indices for copies by @magnatelee in #336
Discussion PR for conda envs split by @bryevdv in #326
Use 64-bit integers for global thread ids by @magnatelee in #349
Use legate.core arg parsing by @bryevdv in #343
adding compress and take operations by @ipdemes in #296
Conda recipes improvements by @marcinz in #345
Misc small updates by @bryevdv in #352
adding performance tests for indexing routines by @ipdemes in #337
Add support for using cupy by @robinw0928 in #373

Bug Fixes

Forward port late commits from 22.03 by @bryevdv in #241
Catch up the ufunc renaming (ported to 22.05) by @magnatelee in #244
Activate the cuBLAS workaround by checking the cuBLAS version at runtime (ported to 22.05) by @magnatelee in #246
fix large shape >int32 by @mfoerste4 in #236
Fix a compile error by @magnatelee in #251
Fix the out-of-bounds bug in reshape by @magnatelee in #267
add missing comparison functions by @bryevdv in #278
Fix nonzero by @magnatelee in #285
fix return value of ndarray.argsort by @mfoerste4 in #286
Fix typos in tests after pytest transition by @manopapad in #309
Update trace.py tests / fix some warnings by @bryevdv in #307
Don't dump test stdout unconditionally by @bryevdv in #314
Add typing_extensions requirement to conda recipe by @marcinz in #325
Fix pytest exit to fail on errors by @marcinz in #334
Fixing #321 issue by @ipdemes in #341
Missing arguments in cases of eager-to-deferred fallback by @manopapad in #348
Add a missing instance of share=True by @manopapad in #350
Fix return types for some of the unary ops by @magnatelee in #354
fixing compile-time warnings by @ipdemes in #351
Remove special case handling for scalar arrays by @manopapad in #357
Fix the bug in np.append test on empty input array and non-empty scalars by @sbak5 in #365
Match NumPy's behavior for isclose(inf,inf) by @manopapad in #372
Fix unary reductions by @magnatelee in #369
Allow DeferredThunks to be created for empty arrays by @manopapad in #371
Fix documentation building by @manopapad in #377
Make the example programs pass the CI by @magnatelee in #380

Documentation

Comparison table update by @ipdemes in #252
Add user-facing docs for coverage reporting by @bryevdv in #261
creating script for calculating API coverage by categories by @ipdemes in #271
Doc update by @magnatelee in #275
Fix docs builds for trace by @manopapad in #308
fixing documentation for fft by @ipdemes in #302
Add a custom autodoc class for ufuncs by @bryevdv in #317
Refactor comparison table as Sphinx extension by @bryevdv in #323
lgpatch docs + doc fixups by @bryevdv in #356

New Contributors

@mferreravila made their first contribution in #238
@m3vaz made their first contribution in #330
@robinw0928 made their first contribution in #373

Full Changelog: v22.03.00...v22.05.00

Contributors

trxcllnt, manopapad, and 9 other contributors

Assets 2

05 Apr 00:35

marcinz

v22.03.00

5e0e6b3

v22.03.00

Release 22.03 adds several new features, including np.repeat, np.unique, np.inner, np.outer, and 35 new universal functions (ufuncs). In this release, we also have significantly revised and refactored tensor operations to make them comprehensive. Preliminary support for 1D array sorting for multi-GPU execution is available. (CPU and OpenMP paths are still single processor only.) We have also made performance improvements for np.convolve and np.tril/trilu for GPU execution. Finally, we have added a tool that reports cuNumeric’s API coverage for a given NumPy program execution. (For the usage, please refer to “Measuring API coverage” in the cuNumeric documentation.)

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

New Features

Sort pr by @mfoerste4 in #199
Add basic cunumeric.patch module by @bryevdv in #225
adding support for REPEAT operation by @ipdemes in #190
np.unique implementation by @magnatelee in #192
np.append & ndarray.flatten by @sbak5 in #196
General cuFFT plan cache by @magnatelee in #195
Tools for checking API coverage by @magnatelee in #191
Overhaul linear algebra operations by @manopapad in #217

Improvements

Move the ufunc module by @magnatelee in #234
ufunc refactoring + a bunch of missing ufuncs by @magnatelee in #223
Expand coverage reporting to ndarray methods by @bryevdv in #219
Einsum benchmark improvements by @manopapad in #222
Remove old-style casts by @manopapad in #218
Optimize np.tril used in Cholesky by @magnatelee in #214
Add a convergence threshold argument to the cg example by @marcinz in #221
Make sure nonzero produces outputs in C order by @magnatelee in #216
API cleanup for ndarray by @bryevdv in #209
Minor improvement for diag by @magnatelee in #211
Stop using alloca by @magnatelee in #212
Port and refactor GH #140 "Use cufft callbacks for better performance on fft-based convolutions" by @magnatelee in #204

Bug Fixes

Activate the cuBLAS workaround by checking the cuBLAS version at runtime by @magnatelee in #245
Catch up the ufunc renaming by @magnatelee in #243
Fix coverage for ufuncs by @bryevdv in #240
Fix docs breakage by @bryevdv in #239
Fix compilation errors on clang by @manopapad in #233
Add cunumeric.ufunc to packages by @bryevdv in #231
Fix trailing comma tuple bug by @bryevdv in #230
Fix the build issue with Thrust by @magnatelee in #227
Fix some docs breakage by @bryevdv in #224
Fix for #208 by @magnatelee in #210
Fix for #206 by @magnatelee in #207
Fixed bugs for 1D array inputs on vstack , dstack and column_stack by @sbak5 in #182

Documentation

Add docstrings to ndarray methods by @bryevdv in #205
Clean up Sphinx warnings by @bryevdv in #202
adding versions to the documentation by @ipdemes in #198
adding script for comparing API coverage + table at the documentation page by @ipdemes in #193
User facing documentation for API usage tool by @bryevdv in #262

Full Changelog: v22.01.00...v22.03.00

Contributors

manopapad, magnatelee, and 5 other contributors

Assets 2

10 Feb 02:26

marcinz

v22.01.00

27a3248

v22.01.00

Release 22.01 adds support for einsum expressions, logic functions and a subset of indexing and array manipulation routines.

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

New Features

Convolution by @magnatelee and @lightsighter in #103
Added few universal functions and logical operations by @ipdemes in #134
numpy.tril and numpy.triu by @magnatelee in #144
Einsum operation by @manopapad in #142
Cholesky factorization by @magnatelee in #160
Implemented split routines and a test by @sbak5 in #152
Choose operation by @ipdemes in #146

Improvements

Convolve Cache for cuFFT by @lightsighter in #109
Warmup iterations for Richardson-Lucy by @magnatelee in #113
Remove NumPyAllocation by @magnatelee in #118
Update for new data ingest interface by @manopapad in #105
Enable some temporarily commented-out tests by @manopapad in #119
Testcase for legate.core!94 by @manopapad in #120
Use built-in reduction op by @magnatelee in #136
Managing CUDA library contexts directly in cuNumeric by @magnatelee in #138
Support for cuSOLVER by @magnatelee in #139
Make CUDA library context cache thread safe by @magnatelee in #141
Use .cu for CUDA library management by @magnatelee in #145
Some reusable test input generators by @manopapad in #153
Fix Wundefined-var-template clang warning by @manopapad in #154
Add eager fallback mode to testing script by @manopapad in #156
Add eager tests by @marcinz in #157
Small additions to test input generators by @manopapad in #159
No longer need to reserve one dim for reductions by @manopapad in #161
Use a per-device stream cache for CUDA library calls by @magnatelee in #165
Simple tiling heuristic for Cholesky factorization by @magnatelee in #167
Fix clang-format config to include cu,cuh,inl files by @manopapad in #168
LEGATE_ABORT is now a statement by @magnatelee in #169
Preloading CUDA libraries by @magnatelee in #171
Use CHECK_* macros in a couple more places by @manopapad in #172
Fix some invocations of complex constructors by @manopapad in #173
Add a switch to not call tril on Cholesky outputs by @magnatelee in #174
Do python install on custom dir w/o eggs by @manopapad in #177
Refined 'tests/array_split.py' w/ more essential input shapes by @sbak5 in #178
WIP: adding logic for DIAGONAL by @ipdemes in #170
Stack and concatenate routines including subroutines by @sbak5 in #175
Refactoring by @magnatelee in #181

Bug Fixes

Fix #111 by @magnatelee in #115
math.prod not available in python 3.7 by @manopapad in #129
Fix some compiler warnings by @magnatelee in #130
dot: fix error message on unsupported array dimensions by @manopapad in #133
Fix slot calculation in reduction kernel by @manopapad in #148
Port fix for #79 by @manopapad in #155
Build OpenBLAS with CROSS option to prevent tests at compile time by @marcinz in #158
Pin setuptools version, to work around breaking change by @manopapad in #164
Workaround for a bug in cuBLAS < 11.4 by @magnatelee in #185
Cannot install cuNumeric to different dir than Legate Core by @manopapad in #186
Adjust error tolerance for float16, to avoid spurious test failure by @manopapad in #166

Documentation

Adding contributions file by @marcinz in #147
Update docstrings by @magnatelee in #188

New Contributors

@lightsighter made their first contribution in #109
@ipdemes made their first contribution in #134
@pre-commit-ci made their first contribution in #151
@sbak5 made their first contribution in #152

Full Changelog: v21.11.00...v22.01.00

Contributors

manopapad, magnatelee, and 5 other contributors

Assets 2

09 Nov 02:33

marcinz

v21.11.00

1270b3c

v21.11.00

This is the initial public alpha release of cuNumeric, an aspiring drop-in replacement for NumPy at scale.

Conda packages for this release are available at https://anaconda.org/legate/cunumeric.

What's Changed

Refactoring for the broadcasting logic by @magnatelee in #18
Improved partitioning and sharding for GEMV by @manopapad in #37
Fix #16 by @manopapad in #38
Add CI by @marcinz in #43
Use a script on the runner to checkout CI repository by @marcinz in #44
Fix CI by @marcinz in #45
Extend tests with CPU/GPU/OMP testing by @marcinz in #48
Remove accidental part of the job matrix from CI by @marcinz in #49
Add missing alignment constraints for matrix-vector multiplication by @magnatelee in #58
Force left alignment for pointers and references by @magnatelee in #59
Don't alter the GC priority �for external instances by @magnatelee in #60
Be strict when importing legate.numpy in examples by @manopapad in #61
Fix for reinterpret casts that are actually unsafe in the modern c++ by @magnatelee in #62
Remove the return type of the void-returning function in the mapper by @magnatelee in #63
Remove dependency on numpy>=1.20 by @manopapad in #64
Stop using looping templates by @magnatelee in #65
Bug fix for release mode by @magnatelee in #66
Port nozero to the new buffer API by @magnatelee in #68
Missing constraint for bincount by @magnatelee in #69
Clean up install script by @manopapad in #70
Fixes to compile on MacOS by @manopapad in #71
Disable absolute and allcose for complex types only with Clang by @magnatelee in #72
Generalize the reshape operator by @magnatelee in #73
Improve dot product for half precision floats by @magnatelee in #74
Support for tensordot by @magnatelee in #75
Bugfixes on operations by @manopapad in #76
Add missing type casts for __half by @magnatelee in #77
Pull the correct Core image by @marcinz in #78
Port remaining fixes from old branch by @manopapad in #80
Remove remaining conditional legate.numpy imports from examples by @manopapad in #81
Always dump test output by @marcinz in #83
Minor code cleanups by @manopapad in #85
Attempt to address #84 by @manopapad in #86
Always follow the core's choice regarding CUDA/OpenMP support by @manopapad in #88
Fix legate data interface by @magnatelee in #92
Handle overlapping stores correctly in dot by @magnatelee in #93
Improvements to handling of scalar arrays by @manopapad in #90
Port to the new calling convention by @magnatelee in #89
Prevent CI on forks by @marcinz in #94
Emptiness checks for matrix ops by @magnatelee in #95
Mapper update by @magnatelee in #82
Port to the new reduction op interface by @magnatelee in #96
Stop using delinearization by @magnatelee in #97
Dead code elimination by @magnatelee in #98
Reorganizing source files by @magnatelee in #99
Remove leftover requirements.txt by @manopapad in #100
Update for build system changes by @manopapad in #101
Updates for new attachment interface by @manopapad in #102
Fix for matrix-vector multiplication by @magnatelee in #104
Another attempt to fix degenerate cases by @magnatelee in #107
Fix #111 by @magnatelee in #116
Release 21.11.00 by @marcinz in #121

New Contributors

@marcinz made their first contribution in #43

Full Changelog: https://github.com/nv-legate/cunumeric/commits/v21.11.00

Contributors

manopapad, magnatelee, and marcinz

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly