Releases: nv-legate/cupynumeric
v22.10.00
The biggest change in Release 22.10 is a new build infrastructure using CMake and scikit-build. The new build system brings several benefits including robust build dependency tracking and compliance with Python site-packages. This release includes several new search and indexing operators, fixes for several performance and correctness bugs, and provenance tracking for top-level and ndarray routines in execution profiles.
Conda packages for this release are available at https://anaconda.org/legate/cunumeric.
What's Changed
🚀 New Features
• Argwhere and flatnonzero by @mfoerste4 in #525
- added extract and place via advanced indexing by @mfoerste4 in #536
- Fill diagonal by @ipdemes in #473
- Single processor implementation for linalg.solve by @magnatelee in #568
🛠️ Improvements
- adding support for array shape () passed as an index argument in advanced indexing by @ipdemes in #486
- Refactor test driver for cpu/gpu sharding by @bryevdv in #451
- Collate test output to allow workers > 1 with verbose output by @bryevdv in #507
- Ensure test.py --use flag fully overrides USE_* envvars by @manopapad in #524
- Enhance two integration tests by @robinw0928 in #511
- Add typing to array.py by @bryevdv in #478
- Update test runner for osx by @bryevdv in #529
- Don't blindly trust user-supplied bincount.minlength by @manopapad in #523
- Make reduced-precision cuBLAS mode opt-in by @manopapad in #519
- Fix reciprocal tests for zero values and improve test value customization (#467) by @marcinz in #537
- Refactor test runner to support more pinning options by @bryevdv in #535
- Remove dead code ian bincount by @magnatelee in #546
- Make the validation condition for random distributions lenient by @magnatelee in #550
- src/cunumeric: handle high number of bins in GPU bincount by @rohany in #526
- Construct NumPy arrays correctly from 0D deferred arrays backed by region fields by @magnatelee in #551
- Collect test failure details at the end by @bryevdv in #556
- Simplify some thunk conversion helpers by @manopapad in #553
- Fix a compiler warning by @magnatelee in #555
- Add option to disable CPU pinning in tests by @bryevdv in #558
- Use the new mapper registration to enable detailed mapper logging by @magnatelee in #570
- src/cunumeric/search: make nonzero not always allocate SYS_MEM buffers by @rohany in #572
- add negative test case in test_array_split.py by @xialu00 in #545
- add some test cases for test_arg_reduce.py by @xialu00 in #575
- Testcase-add test cases for test_flip and test_indices by @xialu00 in #579
- Refactor scalar reductions to use common execution policy by @jjwilke in #573
- Sanitize k for the eye operator by @magnatelee in #586
- Add CMake build for C++ and scikit-build infrastructure for Python package installation by @jjwilke in #514
- Enhance test_block.py and test_eye.py by @robinw0928 in #578
- Testcase add test cases for test_fill.py and test_ndim.py by @xialu00 in #588
- Remove run dependency on curand by @marcinz in #520
- Use Legion Fills when possible by @manopapad in #604
- Support building with GASNet-Ex and MPI backends by @manopapad in #610
- Provenance tracking for cuNumeric operators by @magnatelee in #596
- Fix tests utils to make --directory work correctly. by @robinw0928 in #592
- Fix a compiler warning by @magnatelee in #594
- Enhance test_diag_indices.py and test_flatten.py. by @robinw0928 in #609
- cuNumeric doesn't need nested provenance tracking by @magnatelee in #617
- Add RuntimeError exception to legate.time by @robinw0928 in #618
- Stop instantiating min and max reduction ops for complex types by @magnatelee in #621
- Mark temporary conversion outputs as linear for eager storage recycling by @magnatelee in #608
- Make the negative test on fill robust across Python versions by @magnatelee in #619
- Enhance mask_indices and move_axis by @robinw0928 in #622
- src/cunumeric/matrix: stop including coll.h in solve_template.inl by @rohany in #620
🐛 Bug Fixes
- Fix performance bugs in scalar reductions by @magnatelee in #509
- Don't use internal LAPACK function names by @manopapad in #522
- Bug fixes for advanced indexing by @magnatelee in #532
- Handle the case where LAPACK_*potrf is a macro, not a function by @manopapad in #527
- fix mypy issue w/ np methods by @bryevdv in #542
- Fix buggy complex-to-bool conversions and add correctness tests for astype by @magnatelee in #549
- fixing advanced indexing operation for empty arrays by @ipdemes in #504
- Do not link curand by @marcinz in #541
- Fixing issues with advanced_indexing_kernel by @ipdemes in #557
- fixing another corner case for advanced indexing by @ipdemes in #554
- Fix OSX test shard generation by @bryevdv in #563
- fix error print in test_unary_ufunc by @jjwilke in #566
- Add NAN handling to convert() needed for some prefix routines with integer outputs. by @rkarim2 in #502
- Fixing logic for slicing by @ipdemes in #574
- Fix linalg.solve when inputs are scalars by @magnatelee in #585
- Allow casting in cn.dot, to match numpy's behavior by @manopapad in #598
- Add linalg.solve to the cmake build by @magnatelee in #603
- Invoke eye with read-write privilege, not write-discard by @manopapad in #616
- Fix a bug in scalar reduction launching kernels with empty domains by @magnatelee in #606
📖 Documentation
v22.08.00
Release 22.08.00 features a variety of random distribution implementations (backed by cuRAND), distributed prefix scan operators, and a complete implementation of sorting for multi-node multi-CPU execution. This release also includes several quality-of-life changes and bug fixes, including type annotations for all but one Python module, improvements to the parallel test driver, fixes for several operators when inputs are empty, and proper handling of ndarrays passed as array sizes or indices.
Conda packages for this release are available at https://anaconda.org/legate/cunumeric.
New Features
- Adding support for ND output regions in Advanced Indexing task by @ipdemes in #370
- added support for 'searchsorted' by @mfoerste4 in #414
- np.packbits and np.unpackbits by @magnatelee in #427
- Implementation of atleast_{1,2,3}d by @sbak5 in #404
- Implementing cunumeric.random.BitGenerator by @fduguet-nv in #254
- Adding support for some simple _indices routines by @ipdemes in #417
- adding mask_indices routine by @ipdemes in #426
- Random advanced distributions by @fduguet-nv in #470
- Distributed nd sort for cpu/omp by @mfoerste4 in #437
- Initial implementation of scan routines. by @rkarim2 in #425
- Adding support for take_along_axis and put_along_axis by @ipdemes in #436
- cunumeric.ndim by @magnatelee in #495
- Add support for curand conda package build (cherry pick #510) by @marcinz in #512
Improvements
- Don't run the resolution logic if the arrays have the same dtype by @magnatelee in #389
- Set cuda virtual package as hard run requirement for gpu conda package by @m3vaz in #398
- First pass mypy typing by @bryevdv in #387
- Generalize Dict to Mapping for newer versions of mypy by @jjwilke in #405
- Add support for using cupy in sort.py by @robinw0928 in #395
- Refactor test.py by @bryevdv in #378
- Use Numpy axis normalizations where possible by @bryevdv in #419
- More mypy by @bryevdv in #413
- adding bounds check for advanced indexing by @ipdemes in #397
- Report Elapsed Time in cholesky's output by @SeyedMir in #423
- Support -vv for more verbose test output by @bryevdv in #432
- Add typing to runtime.py by @bryevdv in #428
- Update compress/take tests for pytest by @bryevdv in #435
- Project down to a 1D store for the scalar reduction output by @magnatelee in #455
- Fallback to self = np.ndarray when necessary by @bryevdv in #431
- Add types to thunk modules by @bryevdv in #438
- allclose detail + misc tests improvements by @bryevdv in #457
- cunumeric.random - Adding Module-scoped functions by @fduguet-nv in #481
- Activate the NumPy fallback for cunumeric.random in CPU build by @magnatelee in #485
- Legacy generators for cpu build by @magnatelee in #487
- Allow CPU build to optionally use cuRAND by @magnatelee in #498
- Sanitize shapes in ndarray's constructor by @magnatelee in #496
- src/cunumeric/sort: stop using std::{inclusive, exclusive}_scan by @rohany in #499
- Update conda requirements by @manopapad in #383
- Handle dtype/casting/out properly in contractions by @manopapad in #402
- Missing / overzealous check_eager_args calls by @manopapad in #465
- Strengthen some types by @manopapad in #468
Bug Fixes
- Add missing includes to aid intellisense providers by @trxcllnt in #382
- Proper exception handling for cholesky by @magnatelee in #391
- Fixes for building with setup.py outside conda, primarily Mac by @jjwilke in #394
- Use the right API to check if the store is unbound by @magnatelee in #399
- Fix nargs for report:dump-csv by @bryevdv in #400
- Handle empty outputs correctly in advanced indexing task by @magnatelee in #396
- Fall back to NumPy in array_function and array_ufunc by @magnatelee in #424
- Fix for legate data interface by @magnatelee in #429
- Fix test_floating.py test to call sys.exit by @marcinz in #433
- Make missing pynvml an error for GPU tests by @bryevdv in #441
- Make the NumPy fallback work correctly in randint by @magnatelee in #450
- Squeeze fix by @magnatelee in #448
- Correctly prune out empty tasks in binary reduction by @magnatelee in #453
- Minor fix for indexing routines by @magnatelee in #452
- Make DeferredArray.reshape always return a deferred array by @magnatelee in #454
- Re-freezing conda compiler versions (#415) by @m3vaz in #449
- Fix for floating point predicates by @magnatelee in #466
- markdown version fix by @ipdemes in #459
- Fixup typing regressions by @bryevdv in #471
- Remove ill-defined advanced indexing test case by @magnatelee in #484
- Handle empty inputs correctly in local scan tasks by @magnatelee in #491
- Handle an unknown in a tuple correctly in reshape by @magnatelee in #490
- fix mismatched size_t/uint64_t types by @jjwilke in #475
- Allow scalar cunumeric ndarrays as array indices by @manopapad in #479
Documentation
- adding new version for documentations by @ipdemes in #447
- Updates to api_compare.py by @bryevdv in #456
- Be stricter applying CuWrapperMetadata by @bryevdv in #463
- Add custom nitpicky ref checks for cunumeric APIs by @bryevdv in #462
- Docs coverage check by @bryevdv in #469
- Fix the API reference for random functions and scan operators by @magnatelee in #497
New Contributors
- @jjwilke made their first contribution in #394
- @SeyedMir made their first contribution in #423
- @fduguet-nv made their first contribution in #254
- @rkarim2 made their first contribution in #425
- @rohany made their first contribution in #499
Full Changelog: v22.05.02...v22.08.00
v22.05.02
This hotfix release fixes issues in conda recipes.
What's Changed
- Cherry pick: Update conda requirements (#383) by @marcinz in #406
- Cherry pick: Set cuda virtual package as hard run requirement for conda gpu package (#398) by @marcinz in #407
- Cherry pick: Fix nargs for report:dump-csv (#400) by @marcinz in #408
- Re-freezing conda compiler versions by @m3vaz in #415
Full Changelog: v22.05.01...v22.05.02
v22.05.01
This hotfix release updates the conda build recipe to make the cuNumeric package depend on the right version of NumPy and also fixes a bug in the command-line argument parser.
Full Changelog: v22.05.00...v22.05.01
v22.05.00
Release 22.05 features complete support for advanced indexing and related indexing routines (compress
and take
), a multi-node multi-GPU sorting implementation for multi-dimensional ndarrays, window functions, several matrix/tensor operations (trace, matrix_power, multi_dot, and einsum_path) and primitive support for FFT on a single GPU using cuFFT.
Conda packages for this release are available at https://anaconda.org/legate/cunumeric.
New Features
- thrust allocator for sort by @mfoerste4 in #228
- implementation of np.block w/ a test by @sbak5 in #213
- Window functions by @magnatelee in #283
- Advanced indexing by @ipdemes in #235
- First implementation of single-GPU FFT using cuFFT by @mferreravila in #238
- Use the stream pool in Legate core by @magnatelee in #295
- Add partition api and utilize sort backend by @mfoerste4 in #287
- implementing TRACE operation by @ipdemes in #263
- adding support for negative indices in advanced indexing by @ipdemes in #322
- Add cpu-only packages to the conda variants by @m3vaz in #330
- Bump minpy to 3.8 (conda env and recipe) by @bryevdv in #332
- Remaning ufuncs by @magnatelee in #315
- Logic functions by @magnatelee in #347
- Slicing-based np.block implementation by @sbak5 in #306
- Implement matrix_power by @manopapad in #360
- Distributed N-dimensional sort by @mfoerste4 in #316
- Implement einsum_path by @manopapad in #361
- adding diag_indices and diag_indices_from routines by @ipdemes in #367
- Implement moveaxis by @manopapad in #364
- Implement __array_function __and array_ufunc by @manopapad in #353
- Implement more norm cases by @manopapad in #366
- Implement multi_dot by @manopapad in #358
- Adding support for "indices" routine by @ipdemes in #368
- Support axis=None and keepdims=True/False in argmin and argmax by @trxcllnt in #346
Improvements
- Move the ufunc module (ported to branch-22.05) by @magnatelee in #242
- Use ufuncs in special methods by @magnatelee in #247
- Initial unit tests by @bryevdv in #229
- Revise type coercion by @magnatelee in #264
- adding 'only' option to the tests.py by @ipdemes in #248
- Updates for using the new unbound store API by @magnatelee in #265
- Don't run the resolution logic if the arrays have the same dtype (ported to 22.05) by @magnatelee in #390
- Use find_packages for installation by @magnatelee in #269
- Some misc tests and types by @bryevdv in #268
- Forward-port #257 by @manopapad in #273
- Split up sort.cu for parallel compilation by @magnatelee in #277
- Debugging checks by @magnatelee in #281
- Update example programs by @magnatelee in #289
- Bump up NumPy version by @magnatelee in #291
- Don't use constexpr for window functions by @magnatelee in #294
- Better error message on unsupported complex reductions by @manopapad in #300
- handle coverage wrapping uniformly including ufuncs by @bryevdv in #272
- Architecture-agnostic check for int128 by @manopapad in #293
- Unit test fixups by @bryevdv in #303
- reduce testcases for partition test by @mfoerste4 in #304
- Adding conda build recipe files by @marcinz in #274
- Use pytest for test running by @bryevdv in #297
- Add unit tests to test.py by @marcinz in #305
- Change _cunumeric_implemented into a dataclass by @manopapad in #318
- Pass reporting explicity to coverage decorators by @bryevdv in #333
- FFT refactoring by @magnatelee in #310
- Declare ufunc formatter to be safe for parallel read by @magnatelee in #335
- Force installation of Lapack in OpenBLAS build by @marcinz in #266
- Mark no out-of-range indices for copies by @magnatelee in #336
- Discussion PR for conda envs split by @bryevdv in #326
- Use 64-bit integers for global thread ids by @magnatelee in #349
- Use legate.core arg parsing by @bryevdv in #343
- adding compress and take operations by @ipdemes in #296
- Conda recipes improvements by @marcinz in #345
- Misc small updates by @bryevdv in #352
- adding performance tests for indexing routines by @ipdemes in #337
- Add support for using cupy by @robinw0928 in #373
Bug Fixes
- Forward port late commits from 22.03 by @bryevdv in #241
- Catch up the ufunc renaming (ported to 22.05) by @magnatelee in #244
- Activate the cuBLAS workaround by checking the cuBLAS version at runtime (ported to 22.05) by @magnatelee in #246
- fix large shape >int32 by @mfoerste4 in #236
- Fix a compile error by @magnatelee in #251
- Fix the out-of-bounds bug in reshape by @magnatelee in #267
- add missing comparison functions by @bryevdv in #278
- Fix nonzero by @magnatelee in #285
- fix return value of ndarray.argsort by @mfoerste4 in #286
- Fix typos in tests after pytest transition by @manopapad in #309
- Update trace.py tests / fix some warnings by @bryevdv in #307
- Don't dump test stdout unconditionally by @bryevdv in #314
- Add typing_extensions requirement to conda recipe by @marcinz in #325
- Fix pytest exit to fail on errors by @marcinz in #334
- Fixing #321 issue by @ipdemes in #341
- Missing arguments in cases of eager-to-deferred fallback by @manopapad in #348
- Add a missing instance of share=True by @manopapad in #350
- Fix return types for some of the unary ops by @magnatelee in #354
- fixing compile-time warnings by @ipdemes in #351
- Remove special case handling for scalar arrays by @manopapad in #357
- Fix the bug in np.append test on empty input array and non-empty scalars by @sbak5 in #365
- Match NumPy's behavior for isclose(inf,inf) by @manopapad in #372
- Fix unary reductions by @magnatelee in #369
- Allow DeferredThunks to be created for empty arrays by @manopapad in #371
- Fix documentation building by @manopapad in #377
- Make the example programs pass the CI by @magnatelee in #380
Documentation
- Comparison table update by @ipdemes in #252
- Add user-facing docs for coverage reporting by @bryevdv in #261
- creating script for calculating API coverage by categories by @ipdemes in #271
- Doc update by @magnatelee in #275
- Fix docs builds for trace by @manopapad in #308
- fixing documentation for fft by @ipdemes in #302
- Add a custom autodoc class for ufuncs by @bryevdv in #317
- Refactor comparison table as Sphinx extension by @bryevdv in #323
- lgpatch docs + doc fixups by @bryevdv in #356
New Contributors
- @mferreravila made their first contribution in #238
- @m3vaz made their first contribution in #330
- @robinw0928 made their first contribution in #373
Full Changelog: v22.03.00...v22.05.00
v22.03.00
Release 22.03 adds several new features, including np.repeat, np.unique, np.inner, np.outer, and 35 new universal functions (ufuncs). In this release, we also have significantly revised and refactored tensor operations to make them comprehensive. Preliminary support for 1D array sorting for multi-GPU execution is available. (CPU and OpenMP paths are still single processor only.) We have also made performance improvements for np.convolve and np.tril/trilu for GPU execution. Finally, we have added a tool that reports cuNumeric’s API coverage for a given NumPy program execution. (For the usage, please refer to “Measuring API coverage” in the cuNumeric documentation.)
Conda packages for this release are available at https://anaconda.org/legate/cunumeric.
New Features
- Sort pr by @mfoerste4 in #199
- Add basic cunumeric.patch module by @bryevdv in #225
- adding support for REPEAT operation by @ipdemes in #190
- np.unique implementation by @magnatelee in #192
- np.append & ndarray.flatten by @sbak5 in #196
- General cuFFT plan cache by @magnatelee in #195
- Tools for checking API coverage by @magnatelee in #191
- Overhaul linear algebra operations by @manopapad in #217
Improvements
- Move the ufunc module by @magnatelee in #234
- ufunc refactoring + a bunch of missing ufuncs by @magnatelee in #223
- Expand coverage reporting to ndarray methods by @bryevdv in #219
- Einsum benchmark improvements by @manopapad in #222
- Remove old-style casts by @manopapad in #218
- Optimize np.tril used in Cholesky by @magnatelee in #214
- Add a convergence threshold argument to the cg example by @marcinz in #221
- Make sure nonzero produces outputs in C order by @magnatelee in #216
- API cleanup for ndarray by @bryevdv in #209
- Minor improvement for diag by @magnatelee in #211
- Stop using alloca by @magnatelee in #212
- Port and refactor GH #140 "Use cufft callbacks for better performance on fft-based convolutions" by @magnatelee in #204
Bug Fixes
- Activate the cuBLAS workaround by checking the cuBLAS version at runtime by @magnatelee in #245
- Catch up the ufunc renaming by @magnatelee in #243
- Fix coverage for ufuncs by @bryevdv in #240
- Fix docs breakage by @bryevdv in #239
- Fix compilation errors on clang by @manopapad in #233
- Add cunumeric.ufunc to packages by @bryevdv in #231
- Fix trailing comma tuple bug by @bryevdv in #230
- Fix the build issue with Thrust by @magnatelee in #227
- Fix some docs breakage by @bryevdv in #224
- Fix for #208 by @magnatelee in #210
- Fix for #206 by @magnatelee in #207
- Fixed bugs for 1D array inputs on vstack , dstack and column_stack by @sbak5 in #182
Documentation
- Add docstrings to ndarray methods by @bryevdv in #205
- Clean up Sphinx warnings by @bryevdv in #202
- adding versions to the documentation by @ipdemes in #198
- adding script for comparing API coverage + table at the documentation page by @ipdemes in #193
- User facing documentation for API usage tool by @bryevdv in #262
Full Changelog: v22.01.00...v22.03.00
v22.01.00
Release 22.01 adds support for einsum expressions, logic functions and a subset of indexing and array manipulation routines.
Conda packages for this release are available at https://anaconda.org/legate/cunumeric.
New Features
- Convolution by @magnatelee and @lightsighter in #103
- Added few universal functions and logical operations by @ipdemes in #134
- numpy.tril and numpy.triu by @magnatelee in #144
- Einsum operation by @manopapad in #142
- Cholesky factorization by @magnatelee in #160
- Implemented split routines and a test by @sbak5 in #152
- Choose operation by @ipdemes in #146
Improvements
- Convolve Cache for cuFFT by @lightsighter in #109
- Warmup iterations for Richardson-Lucy by @magnatelee in #113
- Remove NumPyAllocation by @magnatelee in #118
- Update for new data ingest interface by @manopapad in #105
- Enable some temporarily commented-out tests by @manopapad in #119
- Testcase for legate.core!94 by @manopapad in #120
- Use built-in reduction op by @magnatelee in #136
- Managing CUDA library contexts directly in cuNumeric by @magnatelee in #138
- Support for cuSOLVER by @magnatelee in #139
- Make CUDA library context cache thread safe by @magnatelee in #141
- Use .cu for CUDA library management by @magnatelee in #145
- Some reusable test input generators by @manopapad in #153
- Fix Wundefined-var-template clang warning by @manopapad in #154
- Add eager fallback mode to testing script by @manopapad in #156
- Add eager tests by @marcinz in #157
- Small additions to test input generators by @manopapad in #159
- No longer need to reserve one dim for reductions by @manopapad in #161
- Use a per-device stream cache for CUDA library calls by @magnatelee in #165
- Simple tiling heuristic for Cholesky factorization by @magnatelee in #167
- Fix clang-format config to include cu,cuh,inl files by @manopapad in #168
- LEGATE_ABORT is now a statement by @magnatelee in #169
- Preloading CUDA libraries by @magnatelee in #171
- Use CHECK_* macros in a couple more places by @manopapad in #172
- Fix some invocations of complex constructors by @manopapad in #173
- Add a switch to not call tril on Cholesky outputs by @magnatelee in #174
- Do python install on custom dir w/o eggs by @manopapad in #177
- Refined 'tests/array_split.py' w/ more essential input shapes by @sbak5 in #178
- WIP: adding logic for DIAGONAL by @ipdemes in #170
- Stack and concatenate routines including subroutines by @sbak5 in #175
- Refactoring by @magnatelee in #181
Bug Fixes
- Fix #111 by @magnatelee in #115
- math.prod not available in python 3.7 by @manopapad in #129
- Fix some compiler warnings by @magnatelee in #130
- dot: fix error message on unsupported array dimensions by @manopapad in #133
- Fix slot calculation in reduction kernel by @manopapad in #148
- Port fix for #79 by @manopapad in #155
- Build OpenBLAS with CROSS option to prevent tests at compile time by @marcinz in #158
- Pin setuptools version, to work around breaking change by @manopapad in #164
- Workaround for a bug in cuBLAS < 11.4 by @magnatelee in #185
- Cannot install cuNumeric to different dir than Legate Core by @manopapad in #186
- Adjust error tolerance for float16, to avoid spurious test failure by @manopapad in #166
Documentation
- Adding contributions file by @marcinz in #147
- Update docstrings by @magnatelee in #188
New Contributors
- @lightsighter made their first contribution in #109
- @ipdemes made their first contribution in #134
- @pre-commit-ci made their first contribution in #151
- @sbak5 made their first contribution in #152
Full Changelog: v21.11.00...v22.01.00
v21.11.00
This is the initial public alpha release of cuNumeric, an aspiring drop-in replacement for NumPy at scale.
Conda packages for this release are available at https://anaconda.org/legate/cunumeric.
What's Changed
- Refactoring for the broadcasting logic by @magnatelee in #18
- Improved partitioning and sharding for GEMV by @manopapad in #37
- Fix #16 by @manopapad in #38
- Add CI by @marcinz in #43
- Use a script on the runner to checkout CI repository by @marcinz in #44
- Fix CI by @marcinz in #45
- Extend tests with CPU/GPU/OMP testing by @marcinz in #48
- Remove accidental part of the job matrix from CI by @marcinz in #49
- Add missing alignment constraints for matrix-vector multiplication by @magnatelee in #58
- Force left alignment for pointers and references by @magnatelee in #59
- Don't alter the GC priority �for external instances by @magnatelee in #60
- Be strict when importing legate.numpy in examples by @manopapad in #61
- Fix for reinterpret casts that are actually unsafe in the modern c++ by @magnatelee in #62
- Remove the return type of the void-returning function in the mapper by @magnatelee in #63
- Remove dependency on numpy>=1.20 by @manopapad in #64
- Stop using looping templates by @magnatelee in #65
- Bug fix for release mode by @magnatelee in #66
- Port nozero to the new buffer API by @magnatelee in #68
- Missing constraint for bincount by @magnatelee in #69
- Clean up install script by @manopapad in #70
- Fixes to compile on MacOS by @manopapad in #71
- Disable absolute and allcose for complex types only with Clang by @magnatelee in #72
- Generalize the reshape operator by @magnatelee in #73
- Improve dot product for half precision floats by @magnatelee in #74
- Support for tensordot by @magnatelee in #75
- Bugfixes on operations by @manopapad in #76
- Add missing type casts for __half by @magnatelee in #77
- Pull the correct Core image by @marcinz in #78
- Port remaining fixes from old branch by @manopapad in #80
- Remove remaining conditional legate.numpy imports from examples by @manopapad in #81
- Always dump test output by @marcinz in #83
- Minor code cleanups by @manopapad in #85
- Attempt to address #84 by @manopapad in #86
- Always follow the core's choice regarding CUDA/OpenMP support by @manopapad in #88
- Fix legate data interface by @magnatelee in #92
- Handle overlapping stores correctly in dot by @magnatelee in #93
- Improvements to handling of scalar arrays by @manopapad in #90
- Port to the new calling convention by @magnatelee in #89
- Prevent CI on forks by @marcinz in #94
- Emptiness checks for matrix ops by @magnatelee in #95
- Mapper update by @magnatelee in #82
- Port to the new reduction op interface by @magnatelee in #96
- Stop using delinearization by @magnatelee in #97
- Dead code elimination by @magnatelee in #98
- Reorganizing source files by @magnatelee in #99
- Remove leftover requirements.txt by @manopapad in #100
- Update for build system changes by @manopapad in #101
- Updates for new attachment interface by @manopapad in #102
- Fix for matrix-vector multiplication by @magnatelee in #104
- Another attempt to fix degenerate cases by @magnatelee in #107
- Fix #111 by @magnatelee in #116
- Release 21.11.00 by @marcinz in #121
New Contributors
Full Changelog: https://github.com/nv-legate/cunumeric/commits/v21.11.00