Release v0.8.0-beta1
Pre-release
Pre-release
- Added support for on-the-fly specialization of kernels using dynamic partial evaluation.
- Added support for dynamic shared memory (
CPU
&Cuda
backends). - Added new
KernelConfig
structure to specify launch dimensions for explicitly grouped kernels. - Reworked explicitly grouped kernel launchers to use the new
KernelConfig
structure instead ofGroupedIndex
types. - Simplified static
Grid
andGroup
properties. - Added new
Index1
structure to avoid name clashes with newSystem.Index
structure. - Added additional tuple conversion methods to
Index2
andIndex3
types. - Added new
EntryPointDescription
structure to specify an entry point and its index type. - Added
RuntimeKernelConfig
structure to combine static and dynamic information about a particular kernel launch. - Removed all
GroupedIndex
types. - Extended
PTXInstructions
to support bool-based IOs inPTXBackend
(#68). - Extended
ExchangeBuffer
to use new page-locked memory allocation (if available). - Extended
CudaAPI
to supported paged-lock host-memory allocation functions. - Reworked implementation of
GetSubView
in the context of generic and multidimensional array views (#19). - Fixed several issues in the scope of address-space inference.
- Fixed critical code generation issues that could occur when replacing values.
- Fixed invalid pointer types in the scope of
AtomicCAS
operations on AMD hardware (#67).