You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
5200f58
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lux Benchmarks
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s)
411895.5
ns412583
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s)
322166
ns322979.5
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s)
322583
ns243687.5
ns1.32
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s)
741833.5
ns738583
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA
43398
ns43341
ns1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s)
1330145.5
ns1331541.5
ns1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s)
2429292
ns2407792
ns1.01
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s)
14164208
ns16439208.5
ns0.86
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s)
2195542
ns2194563
ns1.00
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA
204689
ns205021
ns1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s)
1414708
ns1426479.5
ns0.99
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s)
903146
ns895625
ns1.01
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s)
1562333
ns1543917
ns1.01
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s)
2259500
ns2206208.5
ns1.02
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1774625
ns1777646
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1098875
ns1078167
ns1.02
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1541229.5
ns1530958
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
2958833
ns3007208
ns0.98
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA
207635.5
ns207880
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12148354.5
ns12178459
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
8822458
ns8815750
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9188791
ns9208125
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18597916
ns18565750
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1492767
ns1492124
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17261541
ns17280708
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
13975000
ns13973458
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14512917
ns14487354.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21852791
ns21838146
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
250272292
ns249950041
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148297917
ns148180333
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
115932000
ns116724083
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
447763458
ns447231500
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5475849
ns5449958
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1224211792
ns1223396291
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
930629834
ns929732416
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
831800229
ns832528750
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1632010250
ns1633536917
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
31656181.5
ns31232077
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1135741875
ns1127648666
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
989475979.5
ns1002243833.5
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1306310687.5
ns1330111333.5
ns0.98
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1731093166.5
ns1732141146
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s)
1096084
ns1037166
ns1.06
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s)
1621125
ns1626521
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s)
3736208
ns3777708
ns0.99
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s)
780896
ns781583
ns1.00
lenet(28, 28, 1, 32)/forward/GPU/CUDA
264547.5
ns262013.5
ns1.01
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s)
2990250
ns3044041.5
ns0.98
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s)
4101000
ns4097042
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s)
11120083
ns11116208
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s)
3243937
ns3145292
ns1.03
lenet(28, 28, 1, 32)/zygote/GPU/CUDA
1096407
ns1092823.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
2316625
ns2334666.5
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1427312.5
ns1419459
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1669042
ns1571666
ns1.06
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
4212291
ns4190125
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
208250.5
ns208094.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
19418541
ns19396000
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
16073458
ns16089229
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
17186250
ns17212895.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
25916083
ns25854646
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1595514
ns1594895
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
34080500.5
ns34213875
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
30855979
ns31031917
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
31220895.5
ns31076583
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
36553708
ns36708292
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
4532459
ns4528125
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2782125
ns2779625
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2915416
ns2669750
ns1.09
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
8380354
ns8386375
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
423404
ns420922
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
39032375
ns38785750
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
32155750
ns32129375
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
32313208
ns32277375
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
51945292
ns51825125
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2634739
ns2627381
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
88517062.5
ns88427125
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
114525875
ns113463375
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
223799104.5
ns227295542
ns0.98
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
74994375
ns74279083
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
268673709
ns268165959
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
159281875
ns158761895.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
126759937.5
ns123751562.5
ns1.02
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
485627875
ns487404084
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
6994649
ns6976350
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1470701416.5
ns1472349770.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
1172207375
ns1171105895.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
1063985125
ns1066855854.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
2005181229
ns2006883229.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
34667442
ns34648809
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1715584583
ns1717718833
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1547782708
ns1535202521
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1856138292
ns1878248417
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
2207623729.5
ns2205912250
ns1.00
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s)
2030167
ns2014792
ns1.01
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s)
2979333
ns3001875
ns0.99
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s)
8195625
ns8115875
ns1.01
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s)
2492167
ns2432666
ns1.02
lenet(28, 28, 1, 128)/forward/GPU/CUDA
275800.5
ns266572
ns1.03
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s)
9616458.5
ns9287083.5
ns1.04
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s)
12040541
ns12062125
ns1.00
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s)
24281208
ns25629896
ns0.95
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s)
11743000
ns11743625.5
ns1.00
lenet(28, 28, 1, 128)/zygote/GPU/CUDA
1191646.5
ns1194959.5
ns1.00
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s)
381538333
ns385030062.5
ns0.99
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s)
284123771.5
ns288360083
ns0.99
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s)
239280375
ns255891729
ns0.94
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s)
452582312.5
ns454057438
ns1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA
4856388
ns4834603
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s)
1153618084
ns1159548125
ns0.99
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s)
934081083
ns928639750
ns1.01
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s)
921635667
ns1041025834
ns0.89
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s)
1402852084
ns1400301917
ns1.00
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA
17820477
ns17860514
ns1.00
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s)
1049334
ns1063417
ns0.99
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s)
2035667
ns2031167
ns1.00
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s)
5341791
ns6215979
ns0.86
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s)
1401833
ns1289687.5
ns1.09
lenet(28, 28, 1, 64)/forward/GPU/CUDA
273502.5
ns274190
ns1.00
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s)
6484854.5
ns6281917
ns1.03
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s)
12404917
ns12390375
ns1.00
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s)
19763708
ns21407437
ns0.92
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s)
6066500
ns6091125
ns1.00
lenet(28, 28, 1, 64)/zygote/GPU/CUDA
1262508
ns1242182
ns1.02
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70493208
ns70428792
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43600229
ns43611000
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39622541
ns39667125
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132698750
ns132442542
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1869551
ns1932803.5
ns0.97
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
356454771
ns355601563
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
270112625
ns270115458
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
254625084
ns253762937.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
534884312.5
ns535226770.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
12284081
ns12278246
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
394586709
ns399133667
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
408404250
ns394300875
ns1.04
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
678487209
ns679147666.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
710785791
ns710284375
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s)
1185827084
ns1194640667
ns0.99
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s)
695281125
ns689604666
ns1.01
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s)
626587375
ns645000312
ns0.97
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s)
1770240250.5
ns1774100021
ns1.00
vgg16(32, 32, 3, 128)/forward/GPU/CUDA
12316473
ns12543961
ns0.98
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s)
3682510624.5
ns3679223834
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s)
2820068709
ns2826435875
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s)
2719913792
ns2707509167
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s)
5042856917
ns5055415292
ns1.00
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA
49349519
ns49466162
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3418708
ns3397917
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2082958
ns2077646
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2540500
ns2508042
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6009666
ns6018583.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
326984.5
ns330593.5
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
26222333
ns25938625.5
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
18914875
ns18943166.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
19338354.5
ns19554000
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
39331500
ns39254458
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2462554
ns2474870
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
56235458
ns55544542
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
80835708
ns82396771
ns0.98
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
169939541.5
ns173728042
ns0.98
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
45337209
ns45516333
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1784584
ns1779250
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1103604.5
ns1102875
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1566667
ns1584416
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3032666
ns3020250
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
213488
ns214596.5
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12541520.5
ns12529625
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9203729.5
ns9208687.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9646562.5
ns9658437
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18999125
ns18982292
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1533148
ns1539988
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17653167
ns17638625
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14324854.5
ns14351250
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14581208.5
ns14600292
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
22196729.5
ns22151625
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70444166.5
ns70460291
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43349417
ns43523750
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39635792
ns39539833
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132857146
ns132477750
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1867040.5
ns1881252.5
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
361132208
ns359873333.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
345972771
ns345894479
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
304480083
ns305415375
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
724393166
ns725056958
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
13316474.5
ns13380753
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
419098563
ns421101688
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
427616083
ns421140000
ns1.02
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
709012833
ns748051292
ns0.95
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
714833666
ns715116625
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s)
1667417
ns1695875
ns0.98
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s)
1348521
ns1355249.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s)
1328125
ns1159396
ns1.15
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s)
2425812.5
ns2409792
ns1.01
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA
582318.5
ns588933.5
ns0.99
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s)
9008145.5
ns8963979.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s)
12958667
ns13000041
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s)
31132854
ns33199749.5
ns0.94
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s)
9869291.5
ns9835041
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA
1441009.5
ns1481667.5
ns0.97
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s)
18356416
ns17747854.5
ns1.03
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s)
17371542
ns17252604.5
ns1.01
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s)
29956292
ns31103916.5
ns0.96
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s)
14108833.5
ns14355042
ns0.98
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s)
696500
ns669500.5
ns1.04
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s)
500812.5
ns556667
ns0.90
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s)
1033458
ns1058916.5
ns0.98
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s)
725500
ns725958
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA
47684
ns48545
ns0.98
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s)
1577042
ns1480334
ns1.07
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s)
1051000
ns1045958
ns1.00
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s)
1370125
ns1649187
ns0.83
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s)
2303687
ns2243312
ns1.03
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA
238125.5
ns241138.5
ns0.99
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s)
1558687.5
ns1523125
ns1.02
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s)
1045916.5
ns1079125
ns0.97
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s)
1461916
ns1484062.5
ns0.99
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s)
2228229
ns2259292
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3409125
ns3401000.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2066270.5
ns2066583.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2525500
ns2508167
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6013791
ns6010687.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA
289794
ns286726
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
24055187.5
ns24062625
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
17202000
ns17164833
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
17133562.5
ns17141292
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
37566146
ns37492334
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2407993
ns2402302
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
54216541.5
ns53536916
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
83772249.5
ns83532937.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
166696854
ns171297229.5
ns0.97
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
44480500
ns44566417
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
250124083
ns249899167
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148262625
ns147932291
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
115870833.5
ns116470479.5
ns0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
448008979
ns450194500
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5471191
ns5441310
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1103626125
ns1101316834
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
857579145.5
ns855731854
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
827690125
ns827979937.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1753891459
ns1754256625
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
29184619
ns28790457
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1020122687.5
ns1014882938
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
963543750
ns964872792
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1321081709
ns1306006041
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1722584666.5
ns1738222270.5
ns0.99
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s)
1339041
ns1311167
ns1.02
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s)
970521
ns957750
ns1.01
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s)
971250
ns703375
ns1.38
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s)
1954229
ns1939625
ns1.01
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA
575255
ns571080
ns1.01
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s)
6044667
ns6006770.5
ns1.01
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s)
6367750
ns6329521
ns1.01
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s)
25225645.5
ns25490292
ns0.99
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s)
7126291.5
ns7082792
ns1.01
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA
1413961
ns1359778.5
ns1.04
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s)
11543792
ns11590167
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s)
9916958
ns10239645.5
ns0.97
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s)
18299375
ns18128834
ns1.01
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s)
8684625
ns8225354.5
ns1.06
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s)
389458
ns362875
ns1.07
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s)
359500
ns363208
ns0.99
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s)
2193750
ns3032958
ns0.72
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s)
88042
ns87687.5
ns1.00
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA
28321
ns28003
ns1.01
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s)
396895.5
ns388708
ns1.02
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s)
440458
ns440375
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s)
4619792
ns4703354
ns0.98
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s)
265375
ns259209
ns1.02
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA
227034
ns220116.5
ns1.03
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s)
429500
ns419708.5
ns1.02
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s)
470792
ns470625
ns1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s)
4863667
ns4962583
ns0.98
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s)
271041.5
ns270875
ns1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s)
333667
ns307959
ns1.08
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s)
294437
ns298292
ns0.99
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s)
740833.5
ns764833
ns0.97
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s)
52854.5
ns54917
ns0.96
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA
28719
ns27854
ns1.03
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s)
366250
ns352750
ns1.04
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s)
335625
ns336000
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s)
813354
ns887229.5
ns0.92
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s)
155312
ns151687.5
ns1.02
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA
212500
ns205379
ns1.03
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s)
377292
ns366770.5
ns1.03
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s)
351833
ns350334
ns1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s)
908146
ns470875
ns1.93
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s)
150959
ns151188
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s)
602630000
ns606032542
ns0.99
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s)
425446812.5
ns429552104
ns0.99
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s)
373310250
ns384048291
ns0.97
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s)
874089917
ns874614500
ns1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA
7031776
ns7024831
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s)
2002661937.5
ns2012811729
ns0.99
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s)
1606159333.5
ns1612557354
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s)
1596550354
ns1572362875
ns1.02
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s)
2634530667
ns2635509917
ns1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA
25871825
ns25977885
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s)
520500
ns521667
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s)
438417
ns439708
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s)
1781667
ns2731999.5
ns0.65
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s)
875958
ns865959
ns1.01
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA
47831
ns47570
ns1.01
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s)
1845875
ns1889812.5
ns0.98
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s)
2803167
ns2797000
ns1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s)
14421478.5
ns16236125
ns0.89
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s)
2762500
ns2650416
ns1.04
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA
254912
ns249198.5
ns1.02
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s)
1952312.5
ns1925354.5
ns1.01
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s)
5063375
ns5070458
ns1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s)
14644042
ns16364250
ns0.89
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s)
2791250
ns2752417
ns1.01
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s)
1556042
ns1507000
ns1.03
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s)
1242250
ns1228583
ns1.01
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s)
1188250
ns1068166.5
ns1.11
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s)
2351542
ns2208000
ns1.07
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA
587337
ns589459
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s)
5962500.5
ns5951958
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s)
4728083
ns4655854.5
ns1.02
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s)
25706291.5
ns27127042
ns0.95
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s)
7342917
ns6596583
ns1.11
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA
1398774.5
ns1347970
ns1.04
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s)
13281166.5
ns12790458
ns1.04
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s)
11246833
ns12034667
ns0.93
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s)
20840520.5
ns22409521.5
ns0.93
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s)
10611542
ns10615250
ns1.00
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s)
2770.5
ns2292
ns1.21
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s)
4875
ns2500
ns1.95
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s)
2875
ns3125
ns0.92
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s)
2500
ns2292
ns1.09
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA
25042
ns24734
ns1.01
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s)
7084
ns7416
ns0.96
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s)
7166
ns7416.5
ns0.97
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s)
7333
ns7666
ns0.96
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s)
7208
ns7167
ns1.01
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA
216255
ns210121.5
ns1.03
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s)
8500
ns8250
ns1.03
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s)
8125
ns8250
ns0.98
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s)
8291
ns8417
ns0.99
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s)
5958
ns5958
ns1
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s)
11416
ns10500
ns1.09
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s)
14208
ns12896
ns1.10
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s)
10958
ns10666
ns1.03
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s)
7167
ns7083
ns1.01
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA
25461
ns24767
ns1.03
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s)
19792
ns20208
ns0.98
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s)
20084
ns20041
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s)
20041
ns20209
ns0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s)
19875
ns20125
ns0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA
236659.5
ns230521
ns1.03
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s)
23500
ns23625
ns0.99
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s)
23500
ns23500
ns1
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s)
23791
ns23750
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s)
21375
ns21125
ns1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s)
28479.5
ns28292
ns1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s)
28917
ns28500
ns1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s)
28625
ns28708
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s)
46083
ns47396
ns0.97
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA
26494
ns25741
ns1.03
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s)
222354.5
ns220542
ns1.01
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s)
275541.5
ns281604.5
ns0.98
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s)
4186187.5
ns4282416
ns0.98
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s)
145834
ns146208.5
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA
209501
ns210161.5
ns1.00
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s)
333708
ns330833
ns1.01
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s)
313334
ns321584
ns0.97
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s)
519937.5
ns763437.5
ns0.68
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s)
161041
ns161895.5
ns0.99
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s)
1729.5
ns2000
ns0.86
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s)
4584
ns1875
ns2.44
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s)
2417
ns2541
ns0.95
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s)
2084
ns1666
ns1.25
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA
23313.5
ns23005
ns1.01
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s)
5500
ns5250
ns1.05
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s)
5250
ns5458
ns0.96
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s)
5209
ns5541
ns0.94
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s)
5209
ns5500
ns0.95
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA
243931.5
ns256932
ns0.95
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s)
11250
ns11250
ns1
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s)
11291
ns11250
ns1.00
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s)
11417
ns11458
ns1.00
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s)
6917
ns7000
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
79906375
ns79995437.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
49046500
ns49006333
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
44994417
ns43270416.5
ns1.04
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
151607583
ns151307500
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
2724784.5
ns2675260
ns1.02
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
664456250
ns673208958
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
410418958
ns413629250
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
401447083
ns396318917
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
682646167
ns684603375
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
14573408
ns14606131
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
711193604.5
ns713812625
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
688899625
ns676256333
ns1.02
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
1017383917
ns1061049792
ns0.96
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
998024084
ns1000454667
ns1.00
This comment was automatically generated by workflow using github-action-benchmark.