torch.OutOfMemoryError: CUDA out of memory. #110

whk6688 · 2024-12-24T01:20:26Z

when i run：【python -u gradio_server.py --video-size 544 960 --video-length 129 --infer-steps 50 --flow-reverse --use-cpu-offload 】in hunyuan， it is OK to generate video.

But run【python demo/gradio_web_demo.py
--model_path data/FastMochi-diffusers
--num_frames 163
--height 480
--width 848
--num_inference_steps 8
--guidance_scale 1.5
--seed 1024
--scheduler_type "pcm_linear_quadratic"
--linear_threshold 0.1
--linear_range 0.75 】in fasthunyuan, i can't generate video. how to reduce memory in fasthunyuan? thanks!

whk6688 · 2024-12-24T01:21:44Z

whk6688 · 2024-12-24T01:31:31Z

thanks. i found a clue: when i run 【export MODEL_BASE=data/FastHunyuan
python fastvideo/sample/sample_t2v_hunyuan.py
--height 544
--width 960
--num_frames 125
--num_inference_steps 6
--guidance_scale 1
--embedded_cfg_scale 6
--flow_shift 17
--flow-reverse
--prompt "A group of people are reading books in the library"
--seed 1024
--output_path outputs_video/hunyuan/cfg6/
--use-cpu-offload
--model_path $MODEL_BASE
--dit-weight $MODEL_BASE/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt】it is ok. only Mochi model cann't infer in my server.

whk6688 · 2024-12-24T01:47:00Z

Hello, we tried to solve the issue.

This is what we did:

We'll update the gradio_web_demo.py file to include memory optimization techniques such as gradient checkpointing, CPU offloading, and attention slicing. These changes should help reduce memory consumption and allow video generation on systems with limited GPU memory.

You can review changes in this commit: jacks-sam1010@e38e2b0.

Caution

Disclaimer: The concept of solution was created by AI and you should never copy paste this code before you check the correctness of generated code. Solution might not be complete, you should use this code as an inspiration only.

Latta AI seeks to solve problems in open source projects as part of its mission to support developers around the world. Learn more about our mission at https://latta.ai/ourmission . If you no longer want Latta AI to attempt solving issues on your repository, you can block this account.

i have tried it. it does not work. same error

BrianChen1129 · 2024-12-27T06:26:34Z

I think one way is to reduce num_frames, or you can try this for running fasthunyuan. Currently FastMochi is unable to support 163 frames on a single 48GB GPU. We will support quantization version FastMochi soon.

foreverpiano assigned foreverpiano and BrianChen1129 and unassigned foreverpiano Dec 27, 2024

BrianChen1129 closed this as completed Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch.OutOfMemoryError: CUDA out of memory. #110

torch.OutOfMemoryError: CUDA out of memory. #110

whk6688 commented Dec 24, 2024

whk6688 commented Dec 24, 2024

whk6688 commented Dec 24, 2024 •

edited

Loading

whk6688 commented Dec 24, 2024

BrianChen1129 commented Dec 27, 2024 •

edited

Loading

torch.OutOfMemoryError: CUDA out of memory. #110

torch.OutOfMemoryError: CUDA out of memory. #110

Comments

whk6688 commented Dec 24, 2024

whk6688 commented Dec 24, 2024

whk6688 commented Dec 24, 2024 • edited Loading

whk6688 commented Dec 24, 2024

BrianChen1129 commented Dec 27, 2024 • edited Loading

whk6688 commented Dec 24, 2024 •

edited

Loading

BrianChen1129 commented Dec 27, 2024 •

edited

Loading