Not enough SMs on RTX 2080 #117

qub3s · 2024-04-25T10:26:28Z

Hey

I know that you guys optimized this project for the A100, and i read that people got the 4090 and the 3090 running. I am only able to work with 2080s (University).

When i try to run your code (amg_example.py), im getting the following errors :

torch._inductor.utils: [WARNING] not enough SMs to use max_autotune_gemm mode

followed by a bunch of "code" and then:
BackendCompilerFailed: backend='inductor' raised:
RuntimeError: Internal Triton PTX codegen error:
ptxas /tmp/compile-ptx-src-76618e, line 149; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-76618e, line 149; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
(.....)
ptxas /tmp/compile-ptx-src-76618e, line 200; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-76618e, line 200; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas fatal : Ptx assembly aborted due to error

Is it just a shortcoming of my hardware or is there anything i am doing wrong.

PS: the Original model runs fine and your project runs as well if i use "sam_model_registry" (i guess that is just the meta implementation)

Thank you.

cpuhrsch · 2024-04-25T18:03:05Z

Hey @qub3s, in this case your GPU doesn't support bfloat16. You'd need to change the model to use float16 (potentially slightly worse accuracy) or use float32 (much slower). sm_80 here stands for the architecture version. The 20 series uses architecture Turing, which I think is sm75.

qub3s · 2024-04-26T12:14:49Z

Thank you for the quick reponse, the model runs fine with float or float16.

The only thing that is strange is that the execution time seems to increase with the batch size when using float.

unit is ms per image at image size of ~500x500 pixel

float 32 - 10 images calculated
batchsize 1: 1908.6171875
batchsize 2: 1849.7623046875
batchsize 3: 4087.425390625
batchsize 4: not enough memory

float 16 - 10 images calculated
batchsize 1: 1040.30185546875
batchsize 2: 976.1693359375
batchsize 3: 996.29150390625
batchsize 4: 961.133984375
batchsize 5: 940.1287109375
batchsize 6: 996.61171875
batchsize 7: not enough memory

Anyway thanks again.

cpuhrsch · 2024-04-26T17:43:50Z

@qub3s - Ah, yes. That's expected, because a bigger batch means you need to allocate more memory throughout model execution. Also see the sam_model_fast_registry (but you'd still need to switch to float16). But I think in general torch.compile might not help that much on RTX 2080. I'd be curious though.

JamesHOEEEE · 2024-05-22T06:21:06Z

Hi all

I was try to run amg_example.py on 2080TI too , I know the triton kernel is only support A100 ,so according the ReadMe file its need to set the environment variable SEGMENT_ANYTHING_FAST_USE_FLASH_4=0,

here is my code

import OS
os.environ[' SEGMENT_ANYTHING_FAST_USE_FLASH_4'] = '0'

but its still have miss the triton module error ,

Did I do something wrong? Or have any suggestions?

thanks you

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not enough SMs on RTX 2080 #117

Not enough SMs on RTX 2080 #117

qub3s commented Apr 25, 2024 •

edited

Loading

cpuhrsch commented Apr 25, 2024

qub3s commented Apr 26, 2024

cpuhrsch commented Apr 26, 2024

JamesHOEEEE commented May 22, 2024

Not enough SMs on RTX 2080 #117

Not enough SMs on RTX 2080 #117

Comments

qub3s commented Apr 25, 2024 • edited Loading

cpuhrsch commented Apr 25, 2024

qub3s commented Apr 26, 2024

cpuhrsch commented Apr 26, 2024

JamesHOEEEE commented May 22, 2024

qub3s commented Apr 25, 2024 •

edited

Loading