You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
699c8d8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lux Benchmarks
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s)
411375
ns412541
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s)
323084
ns242250
ns1.33
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s)
241750
ns322416.5
ns0.75
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s)
742584
ns740041
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA
43670.5
ns43783
ns1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s)
638083
ns641833
ns0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s)
521459
ns443458
ns1.18
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s)
403792
ns478625
ns0.84
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s)
908000
ns958167
ns0.95
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA
188991
ns190648
ns0.99
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s)
744083.5
ns744500
ns1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s)
624667
ns516875
ns1.21
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s)
521562.5
ns622959
ns0.84
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s)
1006750
ns971917
ns1.04
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1618667
ns1626709
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1189854.5
ns1164083
ns1.02
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1358375
ns1354209
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
2360458
ns2382500
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA
211422.5
ns212090
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12284958.5
ns12241875
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9550979.5
ns9584437.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9390791
ns9294250
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18060041.5
ns18002396
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1906624.5
ns1909620
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17280916
ns17364333
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14329167
ns14442750
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14463083
ns14311833
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21088375
ns21053667
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
121038500
ns119842562.5
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
174268209
ns182159229.5
ns0.96
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
155647417
ns147780729
ns1.05
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
103289458
ns104816708
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5459016
ns5472644
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
592681937.5
ns592166875.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
540116125
ns563821542
ns0.96
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
460022146
ns442430104
ns1.04
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
623412250
ns625737792
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
38146652
ns34972882
ns1.09
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
751859749.5
ns713130770.5
ns1.05
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
667614542
ns691544250
ns0.97
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
606980437.5
ns603916250
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
744028250
ns742687041
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s)
861145.5
ns870167
ns0.99
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s)
826334
ns801000.5
ns1.03
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s)
1164604.5
ns1220750
ns0.95
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s)
959395.5
ns949688
ns1.01
lenet(28, 28, 1, 32)/forward/GPU/CUDA
263975.5
ns265193.5
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s)
2730708
ns2750250
ns0.99
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s)
2455708.5
ns2457125
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s)
3317604.5
ns3329208
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s)
3286521.5
ns3289292
ns1.00
lenet(28, 28, 1, 32)/zygote/GPU/CUDA
1038213
ns1041014
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
6779291.5
ns6810916
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
6365500
ns6350125
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
6531583
ns6502792
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
7635875
ns7507833.5
ns1.02
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
210025
ns210394
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
24055375
ns24022917
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
21237625
ns21319833
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
21535792
ns21303250
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
29721771
ns29727625
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1973993
ns1967511.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
37426416
ns37215750
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
34385895.5
ns45637834
ns0.75
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
45888792
ns45715979
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
49367041.5
ns49236354
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
13355875
ns13381562.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
12430958.5
ns12433834
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
12600937.5
ns12537792
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
15122729
ns15142500
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
518849
ns513457.5
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
47134500
ns47126417
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
41671875
ns41878854
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
41125499.5
ns40616375
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
58336333
ns58178583
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
3218047
ns3235162
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
74376750
ns74629396
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
68965000
ns91798958
ns0.75
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
91496292
ns91292209
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
98399104
ns98432125
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
286107083.5
ns287875563
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
339607208
ns347603000
ns0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
321183396
ns314078500
ns1.02
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
268796333
ns269776750
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
7107764
ns7105513
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
971792250
ns971474250
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
922480542
ns943762542
ns0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
835684104
ns823005791
ns1.02
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1117474583
ns1118023937.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
33742759
ns33868329.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1448964667
ns1427625104.5
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1371326875
ns1702832375
ns0.81
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1656412041
ns1637231875
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1663889000
ns1670864792
ns1.00
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s)
1528208
ns1542584
ns0.99
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s)
1277937.5
ns1241333
ns1.03
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s)
1635937.5
ns1613625
ns1.01
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s)
2136917
ns2155812.5
ns0.99
lenet(28, 28, 1, 128)/forward/GPU/CUDA
277390.5
ns272755
ns1.02
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s)
7872250
ns7887667
ns1.00
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s)
6588000
ns6453729
ns1.02
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s)
7229396.5
ns7173500
ns1.01
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s)
10478041
ns10450125
ns1.00
lenet(28, 28, 1, 128)/zygote/GPU/CUDA
1130644
ns1114904
ns1.01
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s)
177405459
ns177697708
ns1.00
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s)
132546709
ns183127729.5
ns0.72
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s)
130053917
ns108153771
ns1.20
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s)
165568083
ns165745583
ns1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA
4878153.5
ns4846033.5
ns1.01
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s)
643663333
ns638012042
ns1.01
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s)
496969000
ns679029417
ns0.73
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s)
558568375
ns519025667
ns1.08
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s)
654929750
ns643337167
ns1.02
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA
18110009
ns16351975
ns1.11
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s)
1068292
ns1081646
ns0.99
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s)
983291
ns954500
ns1.03
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s)
1327542
ns1344104
ns0.99
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s)
1373792
ns1347771
ns1.02
lenet(28, 28, 1, 64)/forward/GPU/CUDA
281111
ns275416
ns1.02
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s)
6002271
ns5770208
ns1.04
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s)
4660958.5
ns4655458.5
ns1.00
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s)
5006354
ns4980792
ns1.01
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s)
5624708
ns5735812
ns0.98
lenet(28, 28, 1, 64)/zygote/GPU/CUDA
1151478.5
ns1152086
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
23602937.5
ns23668250
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
34462041.5
ns43049541.5
ns0.80
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
41206708
ns37347521
ns1.10
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
34998812.5
ns34921604
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1861561
ns1832427
ns1.02
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
184955020.5
ns183460916.5
ns1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
159249771
ns172265542
ns0.92
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
150499917
ns144281125
ns1.04
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
390550250
ns390189959
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
16472871
ns16494935.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
286689500
ns284961708
ns1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
244388646
ns258063041
ns0.95
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
296120917
ns285191292
ns1.04
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
440533417
ns439855709
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s)
624998521
ns619813333.5
ns1.01
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s)
477642917
ns578294708
ns0.83
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s)
411867812.5
ns376301604.5
ns1.09
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s)
656030104
ns655157813
ns1.00
vgg16(32, 32, 3, 128)/forward/GPU/CUDA
12477905
ns12474713
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s)
1873735437.5
ns1799700979
ns1.04
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s)
1636021583
ns1657435625
ns0.99
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s)
1558895000
ns1521911875
ns1.02
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s)
2103890062.5
ns2098240625
ns1.00
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA
49609571
ns49823657
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3064313
ns3073437.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2106875
ns2095291
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2301542
ns2281125
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
4944708.5
ns4821625
ns1.03
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
586671
ns585401
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
25694166
ns25431958
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
20092625.5
ns20342750
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
19545895.5
ns18922583
ns1.03
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
36568812
ns36574542
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
3200820
ns3196535
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
35138250
ns35041958.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
28420084
ns28788125
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
30280062.5
ns29576167
ns1.02
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
42544854.5
ns42034167
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1650167
ns1646125
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1195708
ns1175250
ns1.02
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1388458
ns1363396
ns1.02
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
2498125
ns2504083
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
218867
ns216709
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12700771
ns12715000.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9962124.5
ns9998250
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9800459
ns9683354
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18403354
ns18453354
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1957280
ns1955756
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17702708
ns17696667
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14737000
ns14806937.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14865041
ns14557708
ns1.02
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21477333.5
ns21432729.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
23644021
ns23752041
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
34568146
ns43099166
ns0.80
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
41693959
ns37397812.5
ns1.11
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
34878583
ns34904958.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1840287
ns1842817
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
188357375
ns190852833
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
233488333
ns251191084
ns0.93
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
202742250
ns193659750
ns1.05
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
429823895.5
ns429893688
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
13939550
ns13924800
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
291377187.5
ns289369042
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
249397167
ns265637979
ns0.94
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
300701042
ns292122354
ns1.03
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
446062833
ns445323208
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s)
3387083
ns3394917
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s)
3112854
ns2913791
ns1.07
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s)
2905708
ns3035709
ns0.96
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s)
3940000
ns4098958
ns0.96
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA
570283
ns578446
ns0.99
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s)
7636021
ns7619333
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s)
7442000
ns7367750
ns1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s)
7380521
ns7464166.5
ns0.99
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s)
8212750
ns8211250
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA
1364212
ns1384858.5
ns0.99
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s)
13685833.5
ns13690021
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s)
19094334
ns19212042
ns0.99
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s)
19126041
ns19131458
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s)
15649500.5
ns15652916
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s)
69459
ns69062.5
ns1.01
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s)
69875
ns67604
ns1.03
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s)
72083
ns70458
ns1.02
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s)
68812.5
ns69562
ns0.99
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA
47850
ns48441
ns0.99
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s)
318833.5
ns324458
ns0.98
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s)
285875.5
ns326292
ns0.88
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s)
326000
ns236625
ns1.38
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s)
319625
ns377708
ns0.85
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA
210144
ns214194.5
ns0.98
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s)
447500
ns424083.5
ns1.06
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s)
437791
ns458041.5
ns0.96
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s)
413375
ns356041
ns1.16
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s)
328959
ns375854.5
ns0.88
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3055292
ns3032834
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2092833
ns2078062.5
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2283687.5
ns2268541
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
4895416.5
ns4511375
ns1.09
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA
585359
ns583753.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
23561833
ns23595458
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
18085229
ns18331416
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
18562458
ns16965625
ns1.09
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
35017833
ns35767042
ns0.98
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
3105298.5
ns3121440
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
33378229
ns33311458
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
27662145.5
ns28023083
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
27887458
ns27412334
ns1.02
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
41809854.5
ns41849604
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
120765334
ns121058542
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
174275666
ns181255520.5
ns0.96
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
156098417
ns147913792
ns1.06
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
103997770.5
ns108516083
ns0.96
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5461795.5
ns5463863.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
471697125
ns469339812.5
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
468205208
ns485979041
ns0.96
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
455789333
ns435101416.5
ns1.05
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
728998166
ns729625458
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
35173763
ns32277729
ns1.09
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
640412562.5
ns644292937.5
ns0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
655505917
ns675559041.5
ns0.97
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
590476187.5
ns576973396
ns1.02
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
732032000
ns726825250
ns1.01
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s)
1249541
ns1313500
ns0.95
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s)
949958.5
ns757354.5
ns1.25
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s)
764125
ns902583
ns0.85
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s)
2000458
ns1989917
ns1.01
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA
568299.5
ns572494
ns0.99
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s)
2960792
ns2956604.5
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s)
2611021
ns2531750
ns1.03
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s)
2513020.5
ns2470208
ns1.02
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s)
3690271
ns3689875
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA
1319857
ns1358421.5
ns0.97
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s)
6641791
ns6625021
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s)
6504791
ns6484250
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s)
6489375
ns6454333
ns1.01
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s)
4443166
ns4442292
ns1.00
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s)
104249.5
ns102875
ns1.01
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s)
105166
ns104104
ns1.01
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s)
105250
ns103209
ns1.02
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s)
105625
ns104917
ns1.01
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA
28456
ns28741
ns0.99
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s)
236750
ns237083
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s)
236541
ns237542
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s)
237667
ns236500
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s)
249625
ns250125
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA
217310.5
ns220363.5
ns0.99
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s)
330167
ns330167
ns1
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s)
742062.5
ns744959
ns1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s)
748209
ns741959
ns1.01
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s)
721792
ns722000
ns1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s)
13583
ns13250
ns1.03
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s)
14250
ns13416.5
ns1.06
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s)
14354
ns13875
ns1.03
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s)
13791
ns13750
ns1.00
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA
28098
ns28904
ns0.97
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s)
25333.5
ns25625
ns0.99
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s)
25750
ns25833
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s)
25667
ns25750
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s)
25750
ns25833
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA
206637.5
ns212085
ns0.97
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s)
45583.5
ns45645.5
ns1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s)
45875
ns46208
ns0.99
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s)
46000
ns46208
ns1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s)
28209
ns26875
ns1.05
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s)
309099062.5
ns305618500
ns1.01
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s)
232469666.5
ns277901833
ns0.84
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s)
216377833
ns188750583
ns1.15
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s)
308762583
ns309520834
ns1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA
7672114
ns7623749.5
ns1.01
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s)
1103432604
ns1091097062.5
ns1.01
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s)
1001458208
ns1062816167
ns0.94
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s)
901919771
ns898810583
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s)
1293921625
ns1292104708
ns1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA
27115979
ns27087435
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s)
414208.5
ns417084
ns0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s)
415583
ns419167
ns0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s)
416958
ns415667
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s)
418375.5
ns431375
ns0.97
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA
48086
ns48303
ns1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s)
1344667
ns1452104.5
ns0.93
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s)
1315687
ns1275562
ns1.03
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s)
1294125
ns1264708.5
ns1.02
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s)
1745083.5
ns1725333
ns1.01
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA
221906
ns227769
ns0.97
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s)
1836104.5
ns1850667
ns0.99
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s)
3473770.5
ns3421562.5
ns1.02
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s)
3450771
ns3399499.5
ns1.02
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s)
3660083
ns3662312.5
ns1.00
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s)
1396583.5
ns1486791
ns0.94
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s)
1097333
ns911292
ns1.20
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s)
939062.5
ns1056417
ns0.89
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s)
2231792
ns2195084
ns1.02
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA
574483.5
ns580986.5
ns0.99
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s)
2873417
ns3080583
ns0.93
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s)
2715208
ns2660084
ns1.02
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s)
2626645.5
ns2573166
ns1.02
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s)
3813542
ns3820687
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA
1401203
ns1362672
ns1.03
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s)
8821895.5
ns8819292
ns1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s)
8770604
ns8745333
ns1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s)
8763666.5
ns8773625
ns1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s)
6350229.5
ns6434292
ns0.99
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s)
2250
ns2542
ns0.89
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s)
2583
ns2792
ns0.93
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s)
3333
ns2937.5
ns1.13
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s)
2583
ns2375
ns1.09
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA
24886
ns25652
ns0.97
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s)
7292
ns7041
ns1.04
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s)
7042
ns7208
ns0.98
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s)
7375
ns7209
ns1.02
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s)
6959
ns7125
ns0.98
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA
184871.5
ns192398.5
ns0.96
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s)
8479.5
ns8583
ns0.99
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s)
8667
ns8667
ns1
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s)
8625
ns8542
ns1.01
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s)
6000
ns6000
ns1
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s)
13291
ns13375
ns0.99
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s)
13750
ns13417
ns1.02
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s)
14521
ns14084
ns1.03
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s)
13458
ns13583
ns0.99
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA
25102
ns25588
ns0.98
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s)
29250
ns29125
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s)
28959
ns29209
ns0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s)
29167
ns29292
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s)
29208.5
ns29187.5
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA
194866.5
ns200255.5
ns0.97
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s)
43333
ns42959
ns1.01
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s)
94750
ns93000
ns1.02
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s)
93687.5
ns92959
ns1.01
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s)
90834
ns91250
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s)
27916
ns28333
ns0.99
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s)
28500
ns27979.5
ns1.02
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s)
27166
ns28291
ns0.96
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s)
46166
ns45916.5
ns1.01
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA
26285
ns27099
ns0.97
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s)
44541
ns44375
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s)
44250
ns45208
ns0.98
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s)
44666
ns43708
ns1.02
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s)
63625
ns63187
ns1.01
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA
167275
ns172626.5
ns0.97
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s)
68458
ns68833
ns0.99
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s)
68125
ns69041
ns0.99
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s)
68708
ns68416
ns1.00
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s)
68208
ns68542
ns1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s)
1834
ns1916
ns0.96
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s)
2042
ns1875
ns1.09
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s)
2250
ns2333
ns0.96
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s)
1958
ns1625
ns1.20
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA
23492
ns24015
ns0.98
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s)
5416
ns5042
ns1.07
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s)
5333
ns5291
ns1.01
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s)
5375
ns5416
ns0.99
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s)
5291.5
ns5542
ns0.95
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA
171557
ns176394.5
ns0.97
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s)
8312.5
ns8208
ns1.01
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s)
8250
ns8187.5
ns1.01
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s)
8208
ns8125
ns1.01
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s)
5667
ns5625
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
106272125
ns106620958
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
117220895.5
ns125627166
ns0.93
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
123891541
ns120144521
ns1.03
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
117462292
ns117625187.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
2638590.5
ns2655445
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
390984854
ns389249875
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
370181584
ns378229083
ns0.98
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
344393625
ns354732875
ns0.97
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
481330584
ns489409292
ns0.98
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
15192721.5
ns15161397.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
619409458
ns618241875
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
668415479
ns859950833
ns0.78
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
816519375
ns803956646
ns1.02
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
916595917
ns914357292
ns1.00
This comment was automatically generated by workflow using github-action-benchmark.