Skip to content

Release v0.8.0-beta2

Pre-release
Pre-release
Compare
Choose a tag to compare
@m4rs-mt m4rs-mt released this 03 Jan 02:54
· 1773 commits to master since this release
  • Significantly improved performance of emitted PTX and OpenCL code by enabling more aggressive optimizations and clever code generation (#70).
  • Improved performance of kernel launchers.
  • Added support for linear arrays in local memory.
  • Added support for enum-value interop (#66).
  • Reworked PTXBackend to support all API changes and to fix several critical code-generation issues. This also includes emission of PTX instructions that mimic the Cuda compiler.
  • Reworked OpenCL backend to support all API changes and to fix several critical code-generation issues (#72, #73, #74, #78).
  • Updated the whole compilation pipeline to enable more aggressive optimizations.
  • Added new IR-rewriter API to perform more advanced IR transformations.
  • Adapted all existing transformations to use the new rewriter API.
  • Reduced memory consumption of all nodes by compressing information.
  • Redesigned several IR nodes to support global program transformations.

Special thanks to @MoFtZ for contributing to this release.