Release v0.8.0-beta2
Pre-release
Pre-release
- Significantly improved performance of emitted
PTX
andOpenCL
code by enabling more aggressive optimizations and clever code generation (#70). - Improved performance of kernel launchers.
- Added support for linear arrays in local memory.
- Added support for
enum
-value interop (#66). - Reworked
PTXBackend
to support all API changes and to fix several critical code-generation issues. This also includes emission of PTX instructions that mimic theCuda
compiler. - Reworked
OpenCL
backend to support all API changes and to fix several critical code-generation issues (#72, #73, #74, #78). - Updated the whole compilation pipeline to enable more aggressive optimizations.
- Added new
IR-rewriter
API to perform more advanced IR transformations. - Adapted all existing transformations to use the new
rewriter API
. - Reduced memory consumption of all nodes by compressing information.
- Redesigned several IR nodes to support global program transformations.
Special thanks to @MoFtZ for contributing to this release.