Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about gradient checkpointing? #3

Closed
311dada opened this issue Jun 24, 2024 · 2 comments · May be fixed by #4
Closed

Question about gradient checkpointing? #3

311dada opened this issue Jun 24, 2024 · 2 comments · May be fixed by #4
Assignees

Comments

@311dada
Copy link

311dada commented Jun 24, 2024

Congratulations on the excellent work!

When training large language models, we generally adopt the gradient checkpointing technique. Could you please help me turn on this technique in your code?

Thanks a lot!

@why-in-Shanghaitech
Copy link
Member

Hi! Thank you for your question. I haven't used gradient checkpointing before, so I cannot ensure the correctness of the solution:

  1. Disable huggingface's warning about use_cache since we need it for further iterations (comment it out):

    if self.gradient_checkpointing and self.training:
    if use_cache:
    logger.warning_once(
    "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`..."
    )
    use_cache = False

  2. Add --gradient_checkpointing in the bash script (replace this line or add below):

    --gradient_accumulation_steps 1 \

I test the modified code on RTX3090 w/ config tinyllama_opt.json, batch size 16 w/ gradient checkpointing. The train loss of the first 20 steps is consistent with that w/o gradient checkpointing, batch size 4 and gradient accumulation 4.

I hope it could help. If it works, I'd appreciate it if you could add a simple PR so that more people could benefit from the gradient checkpointing feature.

@why-in-Shanghaitech why-in-Shanghaitech self-assigned this Jun 25, 2024
@311dada
Copy link
Author

311dada commented Jun 27, 2024

Sorry for my late response. I will try it. Thanks for your suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants