Skip to content

v0.2.0

Compare
Choose a tag to compare
@jiazhihao jiazhihao released this 01 Oct 18:38
· 26 commits to main since this release
8edf81c

Major release with a range of changes to Python interface, search implementation, transpiler, and documentation.

What's Changed

  • [Triton CodeGen] Fix an issue when generating Triton programs from mugraphs
  • [LoRA demo] Add the checkpoint file for the lora demo
  • [DeviceMemoryManager] Use offsets instead of pointers to locate tensors and fingerprints in device memory
  • [Graph Generator] Parallelize the generation algorithm
  • Improve parallel search performance
  • [Accumulator] Decouples accumulator from output saver in threadblock graphs
  • Update the setup workflow for packaging
  • Add more element_unary & element_binary operators at the kernel and threadblock levels
  • [CUDA Transpiler] Supporting JIT transpilation and compilation
  • [Search] Range-based pruning
  • Fix some existing issues by @xinhaoc in #63
  • [Transpiler] Support threadblock matmul using cute when the input/output stensors have more than 2 dimensions
  • Include header files for JIT compilation. MIRAGE_ROOT is no longer required.
  • [Python] update python interface to support search
  • [Search] Adjust the expansion phase of search
  • [Search] Improve the display of search statistics
  • Set default max_num_threadblock_graphs to 1

New Contributors

Full Changelog: https://github.com/mirage-project/mirage/commits/v0.2.0