Make Flash attention configurable #44

theissenhelen · 2024-09-16T11:24:58Z

Is your feature request related to a problem? Please describe.

The current implementation causes issues when loading old model checkpoints during inference as it is not clear whether flash attention was used or not.

Describe the solution you'd like

A class MultiHeadSelfAttention and FlashMultiHeadSelfAttention which inherits from the former, but makes use of flash attention. This should be set by the user in the config.

Describe alternatives you've considered

No response

Additional context

No response

Organisation

ECMWF

clessig · 2024-09-16T17:32:55Z

The flash attention has useful options like softcapping. Might be worth exposing them.

theissenhelen added the enhancement New feature or request label Sep 16, 2024

theissenhelen self-assigned this Sep 16, 2024

theissenhelen linked a pull request Sep 20, 2024 that will close this issue

Feature/44 make flash attention configurable #47

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Flash attention configurable #44

Make Flash attention configurable #44

theissenhelen commented Sep 16, 2024

clessig commented Sep 16, 2024

Make Flash attention configurable #44

Make Flash attention configurable #44

Comments

theissenhelen commented Sep 16, 2024

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Organisation

clessig commented Sep 16, 2024