Release Release v0.8.0-beta2 · m4rs-mt/ILGPU

Significantly improved performance of emitted PTX and OpenCL code by enabling more aggressive optimizations and clever code generation (#70).
Improved performance of kernel launchers.
Added support for linear arrays in local memory.
Added support for enum-value interop (#66).
Reworked PTXBackend to support all API changes and to fix several critical code-generation issues. This also includes emission of PTX instructions that mimic the Cuda compiler.
Reworked OpenCL backend to support all API changes and to fix several critical code-generation issues (#72, #73, #74, #78).
Updated the whole compilation pipeline to enable more aggressive optimizations.
Added new IR-rewriter API to perform more advanced IR transformations.
Adapted all existing transformations to use the new rewriter API.
Reduced memory consumption of all nodes by compressing information.
Redesigned several IR nodes to support global program transformations.

Special thanks to @MoFtZ for contributing to this release.

Provide feedback