Release v1.0.0-rc3
Pre-release
Pre-release
github-actions
released this
23 Nov 00:25
·
529 commits
to master
since this release
This final release candidate is a preview of the upcoming ILGPU stable release with a frozen API surface/feature level. It includes performance improvements and several bug fixes including critical patches for the internal loop optimization phases and cross-device peer accesses (get the ILGPU Nuget package and ILGPU Algorithms Nuget package).
Breaking Changes
- Refined the API for building custom
Atomic
implementations to overcome performance limitations (#667).
Changes
- Added explicit conversion methods for
ArrayView
andArrayView1D
(#666). - Improved
Atomics
performance (#667). - Fixed issue with enabling
IO
operations (#694). - Fixed invalid peer-access functionality (#675).
- Fixed invalid address-space inference in the presence of generic view-based casts (#670).
- Fixed critical issues in
LoopUnrolling
phases (#653, #657, #661). - Fixed invalid thread configuration in
CPUDevice
andCPUMultiprocessor
classes (#665). - Fixed missing
NotInsideKernel
attributes onMemSet
functions (#651). - Fixed missing bindings current accelerator in the scope of profiling markers (#644).
- Fixed radix sort on floating point data types (#643).
Repository Changes
- Polished readme, build and license information. (#650, #655).
- Updated samples to new Atomic function API (#667).
Major internal changes
- Bumped several test dependency packages (#659, #662).
- Bumped SourceLink dependencies to v1.1.1 (#689, #690).
- Bumped T4.Build version to v0.2.3 (#685).
- Added automatic skipping of specific CPU tests on MacOS runners (#669).
Special thanks
Special thanks to @MoFtZ, @jgiannuzzi , @deng0 and @conghuiw for their contributions to this release in form of code, feedback, ideas and proposals. Furthermore, we would like to thank the entire ILGPU community for providing feedback, submitting issues and feature requests.
Full Changelog: v1.0.0-rc2...v1.0.0-rc3