Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch.OutOfMemoryError: CUDA out of memory. #110

Closed
whk6688 opened this issue Dec 24, 2024 · 4 comments
Closed

torch.OutOfMemoryError: CUDA out of memory. #110

whk6688 opened this issue Dec 24, 2024 · 4 comments
Assignees

Comments

@whk6688
Copy link

whk6688 commented Dec 24, 2024

when i run:【python -u gradio_server.py --video-size 544 960 --video-length 129 --infer-steps 50 --flow-reverse --use-cpu-offload 】in hunyuan, it is OK to generate video.

But run【python demo/gradio_web_demo.py
--model_path data/FastMochi-diffusers
--num_frames 163
--height 480
--width 848
--num_inference_steps 8
--guidance_scale 1.5
--seed 1024
--scheduler_type "pcm_linear_quadratic"
--linear_threshold 0.1
--linear_range 0.75 】in fasthunyuan, i can't generate video. how to reduce memory in fasthunyuan? thanks!

@whk6688
Copy link
Author

whk6688 commented Dec 24, 2024

image

@whk6688
Copy link
Author

whk6688 commented Dec 24, 2024

thanks. i found a clue: when i run 【export MODEL_BASE=data/FastHunyuan
python fastvideo/sample/sample_t2v_hunyuan.py
--height 544
--width 960
--num_frames 125
--num_inference_steps 6
--guidance_scale 1
--embedded_cfg_scale 6
--flow_shift 17
--flow-reverse
--prompt "A group of people are reading books in the library"
--seed 1024
--output_path outputs_video/hunyuan/cfg6/
--use-cpu-offload
--model_path $MODEL_BASE
--dit-weight $MODEL_BASE/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt】it is ok. only Mochi model cann't infer in my server.

@whk6688
Copy link
Author

whk6688 commented Dec 24, 2024

Hello, we tried to solve the issue.

This is what we did:

We'll update the gradio_web_demo.py file to include memory optimization techniques such as gradient checkpointing, CPU offloading, and attention slicing. These changes should help reduce memory consumption and allow video generation on systems with limited GPU memory.

You can review changes in this commit: jacks-sam1010@e38e2b0.

Caution

Disclaimer: The concept of solution was created by AI and you should never copy paste this code before you check the correctness of generated code. Solution might not be complete, you should use this code as an inspiration only.

Latta AI seeks to solve problems in open source projects as part of its mission to support developers around the world. Learn more about our mission at https://latta.ai/ourmission . If you no longer want Latta AI to attempt solving issues on your repository, you can block this account.

i have tried it. it does not work. same error

@BrianChen1129
Copy link
Collaborator

BrianChen1129 commented Dec 27, 2024

I think one way is to reduce num_frames, or you can try this for running fasthunyuan. Currently FastMochi is unable to support 163 frames on a single 48GB GPU. We will support quantization version FastMochi soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@whk6688 @foreverpiano @BrianChen1129 and others