Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

attention mechanism toggle added #2384

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Aaryanverma
Copy link

added option to toggle attention mechanism while loading the model in case user do not want to use flash attention or similar in case of older GPUs.

added option to toggle attention mechanism while loading the model in case user do not want to use flash attention or similar in case of older GPUs.
@hello-11 hello-11 added triaged Issue has been triaged by maintainers functionality issue labels Oct 30, 2024
@nv-guomingz
Copy link
Collaborator

Hi @Aaryanverma , thanks for your contribution to TRT-LLM project.

If you wanna disable flash attention, I think another war is to set context_fmha to false when building the engine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functionality issue triaged Issue has been triaged by maintainers waiting for feedback
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants