Release v1.2.0-beta1
Pre-release
Pre-release
github-actions
released this
07 Apr 15:07
·
400 commits
to master
since this release
This new beta release includes bug fixes and a significantly improved O2
optimization pipeline (get the ILGPU Nuget package and ILGPU Algorithms Nuget package).
Changes
- Reviewed ILGPU documentation (#750, #776).
- Added Cuda ISA 7.5, ISA 7.6 and SM 8.7 (#778).
- Added support to fold Shuffle and Broadcast operations (#764).
- Improved performance by using uniform branches for NVIDIA GPUs (#765).
- Improved
LoopUnrolling
to cover more cases (#766). - Improved inline PTX to support multiple output and by-ref parameters (#760).
- Fixed issues with
LibDevice
integration (#784). - Fixed issue with unsigned nested conversions (#772, #774).
- Fixed sample project target frameworks (#771).
Internal changes
Special thanks
Special thanks to @hokb, @jgiannuzzi, @MoFtZ and @Ruberik for their contributions to this release in form of code, feedback, ideas and proposals. Furthermore, we would like to thank the entire ILGPU community (especially @Joey9801, @kilngod, @MPSQUARK, @NullandKale and @Yey007) for providing feedback, submitting issues and feature requests.