Using simple cases in pytorch to understanding parallel in AI training/inference.
Unless otherwise specified, all code is run in a linux+DGX A100-40GB+nvcr.io/nvidia/pytorch:23.04-py3(pytorch 2.0) environment.
Please refer to the corresponding installation tutorial for the above environment configuration.
Unless otherwise specified, all code is written by shh2000@github, no code copy from other repos.
Some simple cases in train_basic_model has xx_forward.py, contains only forward(no training) for better understanding.
Cases:
catagory | task | case | parallel type | api | manual with readme |
---|---|---|---|---|---|
train | simple | matmul | None | / | see code |
data | torch.DDP() | see code | |||
1D Tensor | / | see code | |||
Pipeline | torch Pipe() | / | |||
C=A*B | 2D-Tensor | / | see code |