FlashAttention only supports Ampere GPUs or newer. #1292

Lin-A1 · 2024-07-16T10:28:39Z

Lin-A1
Jul 16, 2024

ChatGLM：Traceback (most recent call last):
  File "/home/lin/work/code/DeepLearnling/LLM/ChatGLM3/basic_demo/cli_demo.py", line 62, in <module>
    main()
  File "/home/lin/work/code/DeepLearnling/LLM/ChatGLM3/basic_demo/cli_demo.py", line 48, in main
    for response, history, past_key_values in model.stream_chat(tokenizer, query, history=history, top_p=1,
  File "/home/lin/software/anaconda3/envs/llm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
               ^^^^^^^^^^^^^^
  File "/home/lin/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 1072, in stream_chat
    for outputs in self.stream_generate(**inputs, past_key_values=past_key_values,
  File "/home/lin/software/anaconda3/envs/llm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
               ^^^^^^^^^^^^^^
  File "/home/lin/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 1159, in stream_generate
    outputs = self(
              ^^^^^
  File "/home/lin/software/anaconda3/envs/llm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lin/software/anaconda3/envs/llm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lin/software/anaconda3/envs/llm/lib/python3.12/site-packages/accelerate/hooks.py", line 169, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lin/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 937, in forward
    transformer_outputs = self.transformer(
                          ^^^^^^^^^^^^^^^^^
  File "/home/lin/software/anaconda3/envs/llm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lin/software/anaconda3/envs/llm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lin/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 830, in forward
    hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
                                                                      ^^^^^^^^^^^^^
  File "/home/lin/software/anaconda3/envs/llm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lin/software/anaconda3/envs/llm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lin/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 640, in forward
    layer_ret = layer(
                ^^^^^^
  File "/home/lin/software/anaconda3/envs/llm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lin/software/anaconda3/envs/llm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lin/software/anaconda3/envs/llm/lib/python3.12/site-packages/accelerate/hooks.py", line 169, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lin/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 544, in forward
    attention_output, kv_cache = self.self_attention(
                                 ^^^^^^^^^^^^^^^^^^^^
  File "/home/lin/software/anaconda3/envs/llm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lin/software/anaconda3/envs/llm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lin/software/anaconda3/envs/llm/lib/python3.12/site-packages/accelerate/hooks.py", line 169, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lin/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 441, in forward
    context_layer = self.core_attention(query_layer, key_layer, value_layer, attention_mask)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lin/software/anaconda3/envs/llm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lin/software/anaconda3/envs/llm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lin/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 226, in forward
    context_layer = torch.nn.functional.scaled_dot_product_attention(query_layer, key_layer, value_layer,
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: FlashAttention only supports Ampere GPUs or newer.

我的卡有一张是泰坦的xp，查过资料是FlashAttention的问题，transformers版本高会自动调用FlashAttention，但是降版本又和一些库冲突了，怎么办

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FlashAttention only supports Ampere GPUs or newer. #1292

{{title}}

Replies: 0 comments

Select a reply

FlashAttention only supports Ampere GPUs or newer. #1292

Lin-A1 Jul 16, 2024

Replies: 0 comments

Lin-A1
Jul 16, 2024