You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have noticed that even in DML mode, genai uses a lot of RAM.
I am not sure why it keeps a copy of the model both in RAM and in VRAM.
Previously I have used onnxruntime and was able to bind all the weights only to VRAM without using any RAM at all.
In CPU mode it is even worse with it using so much RAM it is not really usable.
If possible it would be nice that once it loads the model onto the GPU it clears up the RAM. Or even better, it doesn't load all the model into the RAM at the same time but streams it onto the GPU in a more memory efficient way.
Please be aware that most users have limited RAM and that the less memory it uses the better.
Having said that, the new onnxruntime library does seem to be a bit more efficient in avoiding memory spikes. But I think the RAM usage could still be improved greatly.
The text was updated successfully, but these errors were encountered:
I have noticed that even in DML mode, genai uses a lot of RAM.
I am not sure why it keeps a copy of the model both in RAM and in VRAM.
Previously I have used onnxruntime and was able to bind all the weights only to VRAM without using any RAM at all.
In CPU mode it is even worse with it using so much RAM it is not really usable.
If possible it would be nice that once it loads the model onto the GPU it clears up the RAM. Or even better, it doesn't load all the model into the RAM at the same time but streams it onto the GPU in a more memory efficient way.
Please be aware that most users have limited RAM and that the less memory it uses the better.
Having said that, the new onnxruntime library does seem to be a bit more efficient in avoiding memory spikes. But I think the RAM usage could still be improved greatly.
The text was updated successfully, but these errors were encountered: