Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Benchmark sequence parallelism in test_transformer_engine (#3546)
``` $ nvidia-smi -L GPU 0: NVIDIA H100 80GB HBM3 GPU 1: NVIDIA H100 80GB HBM3 GPU 2: NVIDIA H100 80GB HBM3 GPU 3: NVIDIA H100 80GB HBM3 GPU 4: NVIDIA H100 80GB HBM3 GPU 5: NVIDIA H100 80GB HBM3 GPU 6: NVIDIA H100 80GB HBM3 GPU 7: NVIDIA H100 80GB HBM3 $ mpirun -np 8 --output-filename /tmp/test_transformer_engine pytest tests/python/test_transformer_engine.py --only-mpi $ cat /tmp/test_transformer_engine/1/rank.0/stdout ------------------------------------------------------------------------------------------ benchmark: 4 tests ------------------------------------------------------------------------------------------ Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_transformer_layer[sp-forward] 2.2564 (1.0) 55.7794 (11.73) 13.2931 (3.01) 23.7547 (125.77) 2.6707 (1.05) 14.1577 (88.73) 1;1 75.2268 (0.33) 5 1 test_transformer_layer[tp-forward] 2.3941 (1.06) 18.6497 (3.92) 6.7947 (1.54) 7.0469 (37.31) 2.5476 (1.0) 8.2456 (51.68) 1;0 147.1742 (0.65) 5 1 test_transformer_layer[tp-backward] 4.2568 (1.89) 4.8231 (1.01) 4.4578 (1.01) 0.2570 (1.36) 4.2940 (1.69) 0.4091 (2.56) 1;0 224.3258 (0.99) 5 1 test_transformer_layer[sp-backward] 4.3135 (1.91) 4.7558 (1.0) 4.4221 (1.0) 0.1889 (1.0) 4.3292 (1.70) 0.1596 (1.0) 1;1 226.1393 (1.0) 5 1 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ``` Latency is neutral as expected.
- Loading branch information