attention mechanism toggle added #2384

Aaryanverma · 2024-10-28T06:35:32Z

added option to toggle attention mechanism while loading the model in case user do not want to use flash attention or similar in case of older GPUs.

nv-guomingz · 2024-10-30T08:54:13Z

Hi @Aaryanverma , thanks for your contribution to TRT-LLM project.

If you wanna disable flash attention, I think another war is to set context_fmha to false when building the engine.

attention mechanism toggle added

fc5c943

added option to toggle attention mechanism while loading the model in case user do not want to use flash attention or similar in case of older GPUs.

hello-11 added triaged Issue has been triaged by maintainers functionality issue labels Oct 30, 2024

nv-guomingz added the waiting for feedback label Oct 31, 2024

hello-11 assigned nv-guomingz Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

attention mechanism toggle added #2384

attention mechanism toggle added #2384

Aaryanverma commented Oct 28, 2024

nv-guomingz commented Oct 30, 2024

attention mechanism toggle added #2384

Are you sure you want to change the base?

attention mechanism toggle added #2384

Conversation

Aaryanverma commented Oct 28, 2024

nv-guomingz commented Oct 30, 2024