Reduce peak memory when using FSDP on 2+ GPUs #2
Labels
bug
Something isn't working
help wanted
Extra attention is needed
question
Further information is requested
Figure out the peak memory issue with FSDP when running the 615 M parameter on 2 GPUs I linked here:
facebookresearch/fairseq#5318
The text was updated successfully, but these errors were encountered: