You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Where gpu0 would do all the performance, but if gpu0 vram is full, it will offload anything above that to gpu1. Not to use gpu2 to sample any images, but only purpose being to hold the data, same way shared memory, aka ram already does, but gpu1 would process that data. Or would that be even slower process than making system ram holding it due to say 8GB's transfer rate over pcie gen 4 x4 lane bifurcation?
The text was updated successfully, but these errors were encountered:
Hmm, interesting idea but I'm unsure how you'd force comfy to manage that internally. Like, you can just set where the model goes when it's not on the main GPU via the "offload_device" param of the model patcher, but that only allows you to specify one device. For this you'd have to make the actual backend multi-GPU aware so it doesn't try to unload the entire model onto your second GPU (which would just OOM unless it can fit the entire model). Maybe if you mess with the load function but yeah, no clue.
I guess if you have 2x3090s you could try it by editing the end of the UnetLoaderGGUF class in nodes.py to look like this:
Where gpu0 would do all the performance, but if gpu0 vram is full, it will offload anything above that to gpu1. Not to use gpu2 to sample any images, but only purpose being to hold the data, same way shared memory, aka ram already does, but gpu1 would process that data. Or would that be even slower process than making system ram holding it due to say 8GB's transfer rate over pcie gen 4 x4 lane bifurcation?
The text was updated successfully, but these errors were encountered: