-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shared memory out of resource.Need 135200M memory! #63
Comments
Hello @DESEOUMAIGA, Thank you for opening the issue. Do you mind sharing a bit more detail about the environment and GPU type you ran this on? Also, are you trying to reproduce the experiments or are you trying to use this in an end-to-end context? Thank you, |
After fixing some tensor type bugs/issues (float32 to float16...) I can confirm the same error: raise OutOfResources(self.shared, max_shared, "shared memory") RTX A6000 / 48GB. |
I'm getting the same error:
even after trying to reduce the block_size and num_stages in |
I'm running this on an NVIDIA 3090 (24Gb) and getting the same error when running the amg example. I set up a fresh conda environment with python 3.10 and followed the install instructions for sam-fast with pip.
|
Hey all, thanks for giving this project a go! Since this project was optimized for A100s the tuned kernel assumes more shared memory than available in many other GPUs. I'll push a fix to rerun auto-tuning for non A100s. |
Ok, I tried to address this in #67 . Please try again and let me know if it doesn't work :) |
@cpuhrsch Thanks. I tried it on A10G. It works! |
I tried this again on my 24Gb Nvidia 3090, it looks like it works when disabling the custom flash attention kernel, thank you! I get an error with it enabled when running the amg example, I think this is because I haven't run the experiment in experiments/ to create a new kernel for my gpu, trying that now.
|
When I run the experiment script to (i think?) regenerate the triton kernel, I get an out of memory error. It looks like there's no new file in configs/ after running the experiments script, just the A100 config.
|
Hello @rbavery - I just pushed #73 , hope that resolves your issue. For the OOM error, it's entirely possible you don't have enough RAM for batch size 16. Can you try 8 or 4 or maybe even 1? As in run with Just to make sure you won't think I forgot about this, I'm going to be on vacation until November 27th starting today. I'll review it again once I'm back or in between if I find the time. Thank you! |
Thanks I'm running this now with a smaller batch size and it's working! and the BEST_CONFIGS issue fix works as well. |
I'll close this issue now because we seem to have been able to address the issues here, but please reopen if that doesn't apply! |
@cpuhrsch Hey there! I am encountering a similar problem with this error when I am running a Phi3 instruct LLM-based model (from huggingface). I am running this on one GPU of an NVIDIA DGX node (80GB A100 gpus). I need some help and can't understand how to rectify this issue. Any help from your end would be great! |
I encountered the same problem. Just update the triton version according to https://huggingface.co/microsoft/Phi-3-small-8k-instruct
|
Remind the testers of this project that it requires 130GB of memory for the demo.
'triton.runtime.autotuner.OutOfResources: out of resource: shared memory, Required: 135200, Hardware limit: 101376. Reducing block sizes or
num_stages
may help.'The text was updated successfully, but these errors were encountered: