v0.2.0
Major release with a range of changes to Python interface, search implementation, transpiler, and documentation.
What's Changed
- [Triton CodeGen] Fix an issue when generating Triton programs from mugraphs
- [LoRA demo] Add the checkpoint file for the lora demo
- [DeviceMemoryManager] Use offsets instead of pointers to locate tensors and fingerprints in device memory
- [Graph Generator] Parallelize the generation algorithm
- Improve parallel search performance
- [Accumulator] Decouples accumulator from output saver in threadblock graphs
- Update the setup workflow for packaging
- Add more element_unary & element_binary operators at the kernel and threadblock levels
- [CUDA Transpiler] Supporting JIT transpilation and compilation
- [Search] Range-based pruning
- Fix some existing issues by @xinhaoc in #63
- [Transpiler] Support threadblock matmul using cute when the input/output stensors have more than 2 dimensions
- Include header files for JIT compilation. MIRAGE_ROOT is no longer required.
- [Python] update python interface to support search
- [Search] Adjust the expansion phase of search
- [Search] Improve the display of search statistics
- Set default max_num_threadblock_graphs to 1
New Contributors
- @wmdi made their first contribution in #3
- @geohotstan made their first contribution in #14
- @jiakunw made their first contribution in #20
- @interestingLSY made their first contribution in #36
Full Changelog: https://github.com/mirage-project/mirage/commits/v0.2.0