[Retraining] Use Liger Kernel to avoid multi-head logits materialization and scale the context length by N times #119

ByronHsu · 2024-08-26T05:22:07Z

https://github.com/linkedin/Liger-Kernel/tree/main/examples/medusa

With the implementation of FusedLinearCrossEntropy and other kernels in Liger-Kernel, we are able to effectively reduce the memory while increase the throughput. We are happy to collaborate and integrate with our kernels!

ByronHsu · 2024-08-26T05:22:38Z

cc @ctlllll @leeyeehoo @zhyncs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Retraining] Use Liger Kernel to avoid multi-head logits materialization and scale the context length by N times #119

[Retraining] Use Liger Kernel to avoid multi-head logits materialization and scale the context length by N times #119

ByronHsu commented Aug 26, 2024

ByronHsu commented Aug 26, 2024 •

edited

Loading

[Retraining] Use Liger Kernel to avoid multi-head logits materialization and scale the context length by N times #119

[Retraining] Use Liger Kernel to avoid multi-head logits materialization and scale the context length by N times #119

Comments

ByronHsu commented Aug 26, 2024

ByronHsu commented Aug 26, 2024 • edited Loading

ByronHsu commented Aug 26, 2024 •

edited

Loading