Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would it be feasible to use only vram as extension of main gpu? #33

Open
DuckersMcQuack opened this issue Nov 15, 2024 · 1 comment
Open

Comments

@DuckersMcQuack
Copy link

Where gpu0 would do all the performance, but if gpu0 vram is full, it will offload anything above that to gpu1. Not to use gpu2 to sample any images, but only purpose being to hold the data, same way shared memory, aka ram already does, but gpu1 would process that data. Or would that be even slower process than making system ram holding it due to say 8GB's transfer rate over pcie gen 4 x4 lane bifurcation?

@city96
Copy link
Owner

city96 commented Nov 15, 2024

Hmm, interesting idea but I'm unsure how you'd force comfy to manage that internally. Like, you can just set where the model goes when it's not on the main GPU via the "offload_device" param of the model patcher, but that only allows you to specify one device. For this you'd have to make the actual backend multi-GPU aware so it doesn't try to unload the entire model onto your second GPU (which would just OOM unless it can fit the entire model). Maybe if you mess with the load function but yeah, no clue.

I guess if you have 2x3090s you could try it by editing the end of the UnetLoaderGGUF class in nodes.py to look like this:

        model = GGUFModelPatcher.clone(model)
        model.offload_device = torch.device("cuda:1")
        model.model.to(model.offload_device)
        model.patch_on_device = patch_on_device
        return (model,)

My second GPU is a P40 connected via PCIe 3.0 x1 so as expected it's awful lol. It'd be faster if I was reading from my m.2 drive.

CPU offload: 2.43s/it
CUDA1 offloat: 8.32s/it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants