Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flops benchmark #169

Merged
merged 11 commits into from
Nov 6, 2023
Merged

Add flops benchmark #169

merged 11 commits into from
Nov 6, 2023

Conversation

Delaunay
Copy link
Collaborator

Used as sanity checks & diagnostic bottlenecks.
New hardware might not be used efficiently by today's models because they were built for different hardware.
Matrix mult is everywhere in AI, this should make us see if the hardware is slow or simply not used efficiently by models.

@Delaunay Delaunay force-pushed the add_flops branch 4 times, most recently from d36760c to c194ffe Compare October 26, 2023 17:33
@Delaunay
Copy link
Collaborator Author

Source: /Tmp/slurm.3790816.0/base/runs/nilafimu.2023-10-31_18:41:14.852395
=================
Benchmark results
=================
                         fail   n       perf   sem%   std% peak_memory          score weight
bert-fp16                   0   1      49.79   0.0%   0.2%       23952      49.790125   0.00
bert-fp32                   0   1      19.45   0.1%   0.3%       30922      19.452139   0.00
bert-tf32                   0   1      19.47   0.1%   0.3%       30922      19.465893   0.00
bert-tf32-fp16              0   1      49.78   0.0%   0.2%       23952      49.780634   3.00
bf16                        0   1       7.51   0.0%   0.2%        1140       7.507708   0.00
convnext_large-fp16         0   1     121.26   2.6%  13.8%       26632     121.255449   0.00
convnext_large-fp32         0   1      30.83   0.4%   2.2%       45356      30.828629   0.00
convnext_large-tf32         0   1      30.85   0.5%   2.9%       45356      30.845418   0.00
convnext_large-tf32-fp16    0   1     120.86   2.5%  13.6%       26632     120.861461   3.00
davit_large                 0   1     126.33   0.7%   5.1%       32130     126.325192   1.00
davit_large-multi           0   1     126.63   0.6%   4.6%       32374     126.628733   5.00
dlrm                        0   1  258228.32   0.4%   3.2%        6354  258228.316970   1.00
focalnet                    0   1     147.90   1.7%  12.8%       24000     147.901628   2.00
fp16                        0   1      93.70   0.2%   1.6%        1142      93.703274   0.00
fp32                        0   1      13.50   0.0%   0.1%        1524      13.495270   0.00
opt-1_3b                    1   1        NaN    NaN    NaN          -1            NaN   5.00
opt-1_3b-multinode        NaN NaN        NaN    NaN    NaN         NaN            NaN  10.00
opt-6_7b                    1   1        NaN    NaN    NaN       13622            NaN   5.00
opt-6_7b-multinode        NaN NaN        NaN    NaN    NaN         NaN            NaN  10.00
reformer                    0   1      10.21   0.0%   0.1%       24756      10.213739   1.00
regnet_y_128gf              0   1      29.60   0.0%   0.2%       30748      29.596963   2.00
resnet152                   0   1     226.73   1.0%   7.6%       29604     226.730824   1.00
resnet152-multi             0   1     228.92   1.0%   7.9%       29878     228.916266   5.00
resnet50                    0   1     436.92   2.9%  22.1%        4166     436.919842   1.00
rwkv                        0   1     104.91   0.1%   1.1%        4944     104.905045   1.00
stargan                     0   1      11.47   4.5%  34.5%       35648      11.467530   1.00
super-slomo                 0   1      11.28   0.1%   0.4%       36364      11.275021   1.00
t5                          0   1      14.12   0.5%   3.7%       34794      14.124875   2.00
tf32                        0   1      13.50   0.0%   0.2%        1524      13.501287   0.00
whisper                     0   1      81.20   0.1%   0.7%       35968      81.195167   1.00

Scores
------
Failure rate:       7.14% (FAIL)
Score:              10.70

Errors
------
2 errors, details in HTML report.

@Delaunay
Copy link
Collaborator Author

Delaunay commented Nov 1, 2023

Source: /Tmp/slurm.3792181.0/base/runs/kizovota.2023-11-01_14:10:01.514247
=================
Benchmark results
=================
                         fail   n       perf   sem%   std% peak_memory          score weight
bert-fp16                 NaN NaN        NaN    NaN    NaN         NaN            NaN   0.00
bert-fp32                 NaN NaN        NaN    NaN    NaN         NaN            NaN   0.00
bert-tf32                 NaN NaN        NaN    NaN    NaN         NaN            NaN   0.00
bert-tf32-fp16            NaN NaN        NaN    NaN    NaN         NaN            NaN   3.00
bf16                      NaN NaN        NaN    NaN    NaN         NaN            NaN   0.00
convnext_large-fp16       NaN NaN        NaN    NaN    NaN         NaN            NaN   0.00
convnext_large-fp32       NaN NaN        NaN    NaN    NaN         NaN            NaN   0.00
convnext_large-tf32       NaN NaN        NaN    NaN    NaN         NaN            NaN   0.00
convnext_large-tf32-fp16  NaN NaN        NaN    NaN    NaN         NaN            NaN   3.00
davit_large               NaN NaN        NaN    NaN    NaN         NaN            NaN   1.00
davit_large-multi           0   1     163.91   0.5%   3.7%          -1     163.912403   5.00
dlrm                        0   1  246579.75   0.6%   4.3%          -1  246579.749664   1.00
focalnet                  NaN NaN        NaN    NaN    NaN         NaN            NaN   2.00
fp16                      NaN NaN        NaN    NaN    NaN         NaN            NaN   0.00
fp32                      NaN NaN        NaN    NaN    NaN         NaN            NaN   0.00
opt-1_3b                  NaN NaN        NaN    NaN    NaN         NaN            NaN   5.00
opt-1_3b-multinode        NaN NaN        NaN    NaN    NaN         NaN            NaN  10.00
opt-6_7b                  NaN NaN        NaN    NaN    NaN         NaN            NaN   5.00
opt-6_7b-multinode        NaN NaN        NaN    NaN    NaN         NaN            NaN  10.00
reformer                  NaN NaN        NaN    NaN    NaN         NaN            NaN   1.00
regnet_y_128gf            NaN NaN        NaN    NaN    NaN         NaN            NaN   2.00
resnet152                 NaN NaN        NaN    NaN    NaN         NaN            NaN   1.00
resnet152-multi             0   1     356.95   1.0%   7.7%          -1     356.954961   5.00
resnet50                  NaN NaN        NaN    NaN    NaN         NaN            NaN   1.00
rwkv                      NaN NaN        NaN    NaN    NaN         NaN            NaN   1.00
stargan                   NaN NaN        NaN    NaN    NaN         NaN            NaN   1.00
super-slomo               NaN NaN        NaN    NaN    NaN         NaN            NaN   1.00
t5                        NaN NaN        NaN    NaN    NaN         NaN            NaN   2.00
tf32                      NaN NaN        NaN    NaN    NaN         NaN            NaN   0.00
whisper                   NaN NaN        NaN    NaN    NaN         NaN            NaN   1.00

Scores
------
Failure rate:       0.00% (PASS)
Score:               3.01

@Delaunay
Copy link
Collaborator Author

Delaunay commented Nov 1, 2023

Source: /Tmp/slurm.3793160.0/base/runs/rejerebu.2023-11-01_14:20:35.760834
=================
Benchmark results
=================
                         fail   n       perf   sem%   std% peak_memory          score weight
bert-fp16                   0   1     152.11   0.9%   4.8%       24423     152.109606   0.00
bert-fp32                   0   1      29.50   0.1%   0.3%       31387      29.503697   0.00
bert-tf32                   0   1     114.31   0.2%   1.0%       31389     114.310077   0.00
bert-tf32-fp16              0   1     152.44   0.9%   4.7%       24423     152.444346   3.00
bf16                        0   1     294.36   0.2%   1.8%        1611     294.362907   0.00
convnext_large-fp16         0   1     334.82   3.8%  20.3%       27285     334.819560   0.00
convnext_large-fp32         0   1      45.03   0.2%   1.0%       49405      45.026259   0.00
convnext_large-tf32         0   1     124.00   1.8%   9.4%       49405     123.996143   0.00
convnext_large-tf32-fp16    0   1     322.06   3.9%  20.8%       27285     322.056504   3.00
davit_large                 0   1     305.12   0.8%   6.5%       34067     305.119898   1.00
davit_large-multi           0   1     307.03   0.7%   5.6%       34067     307.033831   5.00
dlrm                        0   1  357004.59   1.2%   9.4%        6927  357004.590028   1.00
focalnet                    0   1     384.33   0.6%   4.5%       26165     384.331351   2.00
fp16                        0   1     294.19   0.0%   0.3%        1611     294.192666   0.00
fp32                        0   1      19.13   0.0%   0.1%        1989      19.127785   0.00
opt-1_3b                    1   1        NaN    NaN    NaN          -1            NaN   5.00
opt-1_3b-multinode        NaN NaN        NaN    NaN    NaN         NaN            NaN  10.00
opt-6_7b                    1   1        NaN    NaN    NaN       14041            NaN   5.00
opt-6_7b-multinode        NaN NaN        NaN    NaN    NaN         NaN            NaN  10.00
reformer                    0   1      61.86   0.0%   0.1%       25227      61.857388   1.00
regnet_y_128gf              0   1      87.18   0.5%   4.0%       31377      87.175186   2.00
resnet152                   0   1     670.30   0.8%   6.5%       34277     670.298059   1.00
resnet152-multi             0   1     666.32   1.1%   8.2%       33793     666.319142   5.00
resnet50                    0   1     671.66   5.2%  40.3%        4553     671.657510   1.00
rwkv                        0   1     473.50   0.2%   1.6%        5413     473.502226   1.00
stargan                     0   1      46.71   3.8%  29.2%       37249      46.714831   1.00
super-slomo                 0   1      42.10   0.9%   6.9%       33623      42.101922   1.00
t5                          0   1      48.27   0.7%   5.2%       35267      48.267302   2.00
tf32                        0   1     148.05   0.1%   0.9%        1989     148.048218   0.00
whisper                     0   1     241.36   0.3%   2.7%       36547     241.364055   1.00

Scores
------
Failure rate:       7.14% (FAIL)
Score:              18.21

Errors
------
2 errors, details in HTML report.

@Delaunay
Copy link
Collaborator Author

Delaunay commented Nov 1, 2023

Source: /Tmp/slurm.3792172.0/base/runs/podovepu.2023-11-01_14:48:57.018371
=================
Benchmark results
=================
                         fail   n       perf   sem%   std% peak_memory          score weight
bert-fp16                   0   1     134.01   0.0%   0.3%       24356     134.005684   0.00
bert-fp32                   0   1      28.35   0.1%   0.3%       31320      28.354964   0.00
bert-tf32                   0   1     108.71   0.1%   0.6%       31322     108.705248   0.00
bert-tf32-fp16              0   1     134.01   0.1%   0.4%       24356     134.007693   3.00
bf16                        0   1     294.46   0.2%   1.6%        1544     294.457774   0.00
convnext_large-fp16         0   1     316.19   2.7%  14.8%       27218     316.187095   0.00
convnext_large-fp32         1   1        NaN    NaN    NaN       40852            NaN   0.00
convnext_large-tf32         1   1        NaN    NaN    NaN       40852            NaN   0.00
convnext_large-tf32-fp16    0   1     310.22   3.6%  19.5%       27218     310.218148   3.00
davit_large                 0   1     297.82   0.6%   4.6%       34000     297.823959   1.00
davit_large-multi           0   1     298.50   0.6%   4.8%       34000     298.503954   5.00
dlrm                        0   1  319430.15   0.6%   4.6%        6860  319430.147148   1.00
focalnet                    0   1     376.86   0.5%   4.1%       25664     376.861197   2.00
fp16                        0   1     289.99   0.1%   0.5%        1544     289.985696   0.00
fp32                        0   1      19.18   0.0%   0.0%        1922      19.176587   0.00
opt-1_3b                    1   1        NaN    NaN    NaN          -1            NaN   5.00
opt-1_3b-multinode        NaN NaN        NaN    NaN    NaN         NaN            NaN  10.00
opt-6_7b                    1   1        NaN    NaN    NaN       13974            NaN   5.00
opt-6_7b-multinode        NaN NaN        NaN    NaN    NaN         NaN            NaN  10.00
reformer                    0   1      47.58   0.2%   1.2%       25160      47.581238   1.00
regnet_y_128gf              0   1      78.37   0.5%   3.7%       31310      78.371645   2.00
resnet152                   0   1     608.47   1.1%   8.3%       34218     608.473292   1.00
resnet152-multi             0   1     611.36   0.9%   6.6%       34722     611.362747   5.00
resnet50                    0   1     763.73   4.1%  31.2%        4486     763.728071   1.00
rwkv                        0   1     418.74   0.2%   1.6%        5346     418.743522   1.00
stargan                     0   1      37.88   2.7%  20.8%       37172      37.882975   1.00
super-slomo                 0   1      41.66   1.1%   8.4%       33556      41.662153   1.00
t5                          0   1      42.10   0.0%   0.4%       35200      42.095743   2.00
tf32                        0   1     145.85   0.1%   0.8%        1922     145.846341   0.00
whisper                     0   1     200.34   0.1%   0.5%       36480     200.344352   1.00

Scores
------
Failure rate:      14.29% (FAIL)
Score:              17.48

Errors
------
4 errors, details in HTML report.

@Delaunay Delaunay merged commit 8e62d37 into master Nov 6, 2023
1 of 2 checks passed
@Delaunay Delaunay deleted the add_flops branch November 6, 2023 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant