-
-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA out of memory. (RTX 4070 super) #195
Comments
Hi @KuShiro189 On the Ram & VRAM tab, have you checked the link to make sure the Nvidia Stable Diffusion memory settings aren't disabled? https://nvidia.custhelp.com/app/answers/detail/a_id/5490/~/system-memory-fallback-for-stable-diffusion This setting allows Windows machines to extend their VRAM memory into System RAM if needed. If its been turned off, you can only use the 12GB of VRAM that you have. Thanks |
appreciated the quick response! i had not used stable diffusion before, and have not set anything about the memory settings but the finetune script still gives the following error: on a side note, on my RAM & VRAM tab, the GPU information and vram did not show up and only shows the system RAM, despite that my GPU is still being utilized by the finetune script as far as i've seen it in task manager. perhaps the issues might be about it? sorry for taking long to reply, my PC crashed during another attempt like above |
Hi @KuShiro189 The article is called "Stable Diffusion memory fallback" by Nvidia, though the actual setting is "CUDA - Sysmem Fallback Policy" and it changes the way the Nvidia driver works with memory allocation. They should have called it something better and less confusing. If you haven't, I would 110% suggest you check if that setting has been changed by something else as other applications can change it. Its the only setting that I know of for Windows that will have any effect on the VRAM memory allocation. I've just run through a finetuning process to confirm all is working. The default behaviour you should see when it runs is that as the 1st Epoch comes to an end, with a 12GB GPU, the Shared memory should start to increase. This is the "CUDA - Sysmem Fallback Policy" in operation, allowing the GPU to use System RAM when it runs out of memory. It only uses that memory for maybe 30-60 seconds as it stores 3 x 5GB copies of the AI model in there as it shifts things around before saving 2x of them off to disk. When it saves them off to disk, it releases that memory, so you see both your VRAM and Shared memory drop again: As you will note, those screenshots are both running on a RTX 4070 with 12GB VRAM. And the only secret to making it run, that I know of, is to "CUDA - Sysmem Fallback Policy" as without that, your Nvdia driver will limit CUDA operations to the 12GB VRAM built into your GPU. So can you confirm you have checked that setting to see if its been disabled OR indeed set it to "Prefer Sysmem Fallback" to see if that changes things for you. Thanks |
Hi @KuShiro189 The only thing I can add as a thought, is that I dont know how that value is activated/passed over to Python's Pytorch CUDA environment, meaning that, when you change the setting, you will probably have to open a new command prompt and load a fresh Python environment. Using an already open command prompt may not carry over the new setting, but I cant say for certain as Ive not looked into its behaviour in that kind of detail. |
@KuShiro189 what I learned is that your System(C:) Partition should have atleast 20GB of Space. If that runs out, OOM error seems to occur aswell. |
thank you both for the input! i did attempted to start a new CMD and python environment after i have set the nvidia settings, and still no luck, my GPU seem to refuse overflowing the memory into the RAM also i have 121GB free on SSD for now, shouldn't be the problem there my thoughts are that my GPU somehow refuses to overflow its memory to system RAM due to either a factory setting prevented it or something in the BIOS or system prevented it. im going to check everything for a while once again thank you for both of your time! |
no luck yet ;-; in case of this keep goes on, perhaps there are other way to do this? probably anyone with a good GPU can help me finetune the model with my dataset? |
@KuShiro189 The So as far as AllTalk's code goes, it just accesses the VRAM memory via Python. AllTalk sends requests to Python and AllTalk has no concept/access to control Python itself doesn't have that level of control either, which is why its back to the Nvidia driver to extend or not extend into System Ram. This setting is also ONLY available on Windows systems and The only suggestions I have at this point are:
I've had a general hunt of the internet and I cant think of or see any other routes to try diagnose/resolve this. For various reasons I had to run about 8 finetuning sessions yesterday on the current finetune code from github and I didn't encounter the out of memory issue once, everything behaved as expected. The only real difference between your system and mine was that you are on Windows 10 and Im on 11, which shouldn't make the slightest bit of difference. And I was on 2 Nvidia driver versions later than yourself, but again, that shouldnt make a difference and there were no bug fixes relating to memory management between the versions of the driver. I can only suggest try the above 3 things. Other than that, I am stumped for what to try or further things to suggest. Thanks |
thank you so much for the time! very appreciated that! |
🔴 If you have installed AllTalk in a custom Python environment, I will only be able to provide limited assistance/support. AllTalk draws on a variety of scripts and libraries that are not written or managed by myself, and they may fail, error or give strange results in custom built python environments.
🔴 Please generate a diagnostics report and upload the "diagnostics.log" as this helps me understand your configuration.
diagnostics.log
Describe the bug
CUDA out of memory on any batch size even on batch size 1 (RTX 4070 super)
To Reproduce
here are the parameters i attempted (every single of them would return CUDA out of memory):
-the default set
-32 epoch, 16 batch size, 1 grad acc steps, 16 max permitted size of audio
-24 epoch, 8 batch size, 1 grad acc steps, 8 max permitted size of audio
-16 epoch, 2 batch size, 1 grad acc steps, 8 max permitted size of audio
-8 epoch, 4 batch size, 2 grad acc steps, 8 max permitted size of audio
-8 epoch, 4 batch size, 1 grad acc steps, 8 max permitted size of audio (the screenshot)
-2 epoch, 1 batch size, 1 grad acc steps, 4 max permitted size of audio
Screenshots
Text/logs
Traceback (most recent call last):
File "C:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\trainer\trainer.py", line 1833, in fit
self._fit()
File "C:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\trainer\trainer.py", line 1785, in _fit
self.train_epoch()
File "C:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\trainer\trainer.py", line 1504, in train_epoch
outputs, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\trainer\trainer.py", line 1360, in train_step
outputs, loss_dict_new, step_time = self.optimize(
^^^^^^^^^^^^^^
File "C:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\trainer\trainer.py", line 1288, in optimize
optimizer.step()
File "C:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\torch\optim\lr_scheduler.py", line 75, in wrapper
return wrapped(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\torch\optim\optimizer.py", line 385, in wrapper
out = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\torch\optim\optimizer.py", line 76, in _use_grad
ret = func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\torch\optim\adamw.py", line 187, in step
adamw(
File "C:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\torch\optim\adamw.py", line 339, in adamw
func(
File "C:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\torch\optim\adamw.py", line 608, in _multi_tensor_adamw
exp_avg_sq_sqrt = torch._foreach_sqrt(device_exp_avg_sqs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 11.99 GiB of which 1.57 GiB is free. Of the allocated memory 7.50 GiB is allocated by PyTorch, and 186.08 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\AI\text-generation-webui-main\extensions\alltalk_tts\finetune.py", line 1376, in train_model
config_path, original_xtts_checkpoint, vocab_file, exp_path, speaker_wav = train_gpt(language, num_epochs, batch_size, grad_acumm, train_csv, eval_csv, output_path=str(output_path), max_audio_length=max_audio_length)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\AI\text-generation-webui-main\extensions\alltalk_tts\finetune.py", line 617, in train_gpt
trainer.fit()
File "C:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\trainer\trainer.py", line 1860, in fit
remove_experiment_folder(self.output_path)
File "C:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\trainer\generic_utils.py", line 77, in remove_experiment_folder
fs.rm(experiment_path, recursive=True)
File "C:\AI\text-generation-webui-main\installer_files\env\Lib\site-packages\fsspec\implementations\local.py", line 185, in rm
shutil.rmtree(p)
File "C:\AI\text-generation-webui-main\installer_files\env\Lib\shutil.py", line 787, in rmtree
return _rmtree_unsafe(path, onerror)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\AI\text-generation-webui-main\installer_files\env\Lib\shutil.py", line 634, in _rmtree_unsafe
onerror(os.unlink, fullname, sys.exc_info())
File "C:\AI\text-generation-webui-main\installer_files\env\Lib\shutil.py", line 632, in _rmtree_unsafe
os.unlink(fullname)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:/AI/text-generation-webui-main/extensions/alltalk_tts/finetune/tmp-trn/training/XTTS_FT-April-30-2024_10+11PM-ea551d3\trainer_0_log.txt'
Desktop (please complete the following information):
AllTalk was updated: 3/18/2024
Custom Python environment: text generation webUI's python environment, but i've attempted on my local python environment aswell and they returned the same error
Text-generation-webUI was updated: 3/11/2024
Additional context
seems like regardless of what parameters i set, it will always try to utilize the entire 12GB VRAM ignoring the 0.5-1GB used by other programs.
i'd like to specifically know how did you pulled it off in your 4070 aswell if possible,thanks!
The text was updated successfully, but these errors were encountered: