You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
cb92a56
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lux Benchmarks
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s)
412541
ns411750
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s)
242250
ns241583
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s)
322416.5
ns322167
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s)
740041
ns740459
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA
43783
ns44353
ns0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s)
641833
ns655917
ns0.98
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s)
443458
ns464833
ns0.95
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s)
478625
ns468833
ns1.02
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s)
958167
ns953354.5
ns1.01
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA
190648
ns191177
ns1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s)
744500
ns762708
ns0.98
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s)
516875
ns569834
ns0.91
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s)
622959
ns633417
ns0.98
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s)
971917
ns954041
ns1.02
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1626709
ns1594000
ns1.02
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1164083
ns1160208
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1354209
ns1349792
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
2382500
ns2338000
ns1.02
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA
212090
ns213116.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12241875
ns12315208
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9584437.5
ns9597708
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9294250
ns9293542
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18002396
ns17930125
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1909620
ns1907223.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17364333
ns17341834
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14442750
ns14420792
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14311833
ns14304834
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21053667
ns21034709
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
119842562.5
ns120916625
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
182159229.5
ns182214542
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
147780729
ns148302000
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
104816708
ns108170625
ns0.97
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5472644
ns5472288
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
592166875.5
ns591646750.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
563821542
ns563684334
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
442430104
ns441224584
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
625737792
ns624792917
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
34972882
ns34975276
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
713130770.5
ns711934541.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
691544250
ns694318791
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
603916250
ns615736125
ns0.98
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
742687041
ns746029625
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s)
870167
ns864625
ns1.01
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s)
801000.5
ns801041.5
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s)
1220750
ns1219979.5
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s)
949688
ns954250
ns1.00
lenet(28, 28, 1, 32)/forward/GPU/CUDA
265193.5
ns271063.5
ns0.98
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s)
2750250
ns2719646
ns1.01
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s)
2457125
ns2462708.5
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s)
3329208
ns3306125
ns1.01
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s)
3289292
ns3385271
ns0.97
lenet(28, 28, 1, 32)/zygote/GPU/CUDA
1041014
ns1061902
ns0.98
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
6810916
ns6794666
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
6350125
ns6363417
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
6502792
ns6537167
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
7507833.5
ns7529437.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
210394
ns211586
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
24022917
ns23988417
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
21319833
ns21318917
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
21303250
ns21539417
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
29727625
ns29676416.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1967511.5
ns1981095
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
37215750
ns37358667
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
45637834
ns45576125
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
45715979
ns34606625
ns1.32
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
49236354
ns49443917
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
13381562.5
ns13334188
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
12433834
ns12465000
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
12537792
ns12598625
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
15142500
ns15188833
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
513457.5
ns512594
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
47126417
ns47191979
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
41878854
ns41899021
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
40616375
ns40856541
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
58178583
ns58133083
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
3235162
ns3233025.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
74629396
ns75271104
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
91798958
ns91865084
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
91292209
ns68909500
ns1.32
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
98432125
ns98569708
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
287875563
ns285118604
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
347603000
ns347578167
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
314078500
ns315462416
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
269776750
ns275551125
ns0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
7105513
ns7112475.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
971474250
ns973431500
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
943762542
ns941010291
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
823005791
ns826121208
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1118023937.5
ns1128469333.5
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
33868329.5
ns33864812.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1427625104.5
ns1435266167
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1702832375
ns1709541500
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1637231875
ns1266814792
ns1.29
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1670864792
ns1672710667
ns1.00
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s)
1542584
ns1549875
ns1.00
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s)
1241333
ns1256625.5
ns0.99
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s)
1613625
ns1620708
ns1.00
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s)
2155812.5
ns2159125
ns1.00
lenet(28, 28, 1, 128)/forward/GPU/CUDA
272755
ns276844
ns0.99
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s)
7887667
ns7894542
ns1.00
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s)
6453729
ns6659583.5
ns0.97
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s)
7173500
ns7112458
ns1.01
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s)
10450125
ns10466625
ns1.00
lenet(28, 28, 1, 128)/zygote/GPU/CUDA
1114904
ns1131295
ns0.99
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s)
177697708
ns178664375
ns0.99
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s)
183127729.5
ns183079667
ns1.00
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s)
108153771
ns110239854.5
ns0.98
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s)
165745583
ns165902562.5
ns1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA
4846033.5
ns4850474.5
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s)
638012042
ns637338916
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s)
679029417
ns688777459
ns0.99
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s)
519025667
ns453913416
ns1.14
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s)
643337167
ns656584541
ns0.98
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA
16351975
ns16410189
ns1.00
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s)
1081646
ns1076312.5
ns1.00
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s)
954500
ns957271
ns1.00
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s)
1344104
ns1343625
ns1.00
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s)
1347771
ns1344604
ns1.00
lenet(28, 28, 1, 64)/forward/GPU/CUDA
275416
ns279240
ns0.99
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s)
5770208
ns6007729
ns0.96
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s)
4655458.5
ns4675125
ns1.00
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s)
4980792
ns4946791
ns1.01
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s)
5735812
ns5677084
ns1.01
lenet(28, 28, 1, 64)/zygote/GPU/CUDA
1152086
ns1154307
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
23668250
ns23587458
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43049541.5
ns44837937.5
ns0.96
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
37347521
ns37828166
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
34921604
ns34890312
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1832427
ns1835859
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
183460916.5
ns184849458
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
172265542
ns173020292
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
144281125
ns145743417
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
390189959
ns391585708
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
16494935.5
ns16488184.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
284961708
ns284020041
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
258063041
ns257804083.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
285191292
ns289035959
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
439855709
ns440921375
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s)
619813333.5
ns623252354.5
ns0.99
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s)
578294708
ns578021459
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s)
376301604.5
ns377542125
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s)
655157813
ns659480437.5
ns0.99
vgg16(32, 32, 3, 128)/forward/GPU/CUDA
12474713
ns12471152
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s)
1799700979
ns1819111104.5
ns0.99
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s)
1657435625
ns1660737208
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s)
1521911875
ns1556316104
ns0.98
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s)
2098240625
ns2162942771
ns0.97
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA
49823657
ns49790206
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3073437.5
ns3047666
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2095291
ns2113395.5
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2281125
ns2276187.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
4821625
ns4615062.5
ns1.04
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
585401
ns580103
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
25431958
ns25551000
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
20342750
ns20372104.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
18922583
ns18984250
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
36574542
ns36475396
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
3196535
ns3197534
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
35041958.5
ns35362875
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
28788125
ns28780458
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
29576167
ns29749292
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
42034167
ns42359958
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1646125
ns1644375
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1175250
ns1184292
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1363396
ns1380959
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
2504083
ns2490125
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
216709
ns217958.5
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12715000.5
ns12687000
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9998250
ns10006167
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9683354
ns9643084
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18453354
ns18392479
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1955756
ns1945166.5
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17696667
ns17715708
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14806937.5
ns14807375
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14557708
ns14584104
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21432729.5
ns21450895.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
23752041
ns23273292
ns1.02
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43099166
ns43934833
ns0.98
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
37397812.5
ns37907334
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
34904958.5
ns34857583
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1842817
ns1854216
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
190852833
ns189714750
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
251191084
ns252758813
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
193659750
ns196003395.5
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
429893688
ns431014896
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
13924800
ns13876633.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
289369042
ns288996271
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
265637979
ns265619583
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
292122354
ns294970666.5
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
445323208
ns447715041
ns0.99
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s)
3394917
ns3400249.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s)
2913791
ns2883458
ns1.01
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s)
3035709
ns3083459
ns0.98
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s)
4098958
ns4098667
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA
578446
ns585962
ns0.99
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s)
7619333
ns7635041
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s)
7367750
ns7317312.5
ns1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s)
7464166.5
ns7452208.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s)
8211250
ns8215479
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA
1384858.5
ns1410316
ns0.98
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s)
13690021
ns18791292
ns0.73
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s)
19212042
ns19172041.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s)
19131458
ns19131167
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s)
15652916
ns10737041.5
ns1.46
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s)
69062.5
ns68687.5
ns1.01
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s)
67604
ns67375
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s)
70458
ns70250
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s)
69562
ns68250
ns1.02
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA
48441
ns49195
ns0.98
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s)
324458
ns323229
ns1.00
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s)
326292
ns332709
ns0.98
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s)
236625
ns316083
ns0.75
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s)
377708
ns318042
ns1.19
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA
214194.5
ns218315
ns0.98
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s)
424083.5
ns444708
ns0.95
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s)
458041.5
ns400125
ns1.14
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s)
356041
ns414833
ns0.86
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s)
375854.5
ns356271
ns1.05
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3032834
ns3032708
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2078062.5
ns2089166.5
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2268541
ns2260084
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
4511375
ns4570896
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA
583753.5
ns585123
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
23595458
ns23581083
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
18331416
ns18324312.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
16965625
ns16907208
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
35767042
ns36054896
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
3121440
ns3100409.5
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
33311458
ns33410625
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
28023083
ns27996292
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
27412334
ns27450166
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
41849604
ns41964333
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
121058542
ns118848041.5
ns1.02
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
181255520.5
ns181989437.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
147913792
ns147982042
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
108516083
ns103352208
ns1.05
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5463863.5
ns5461107
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
469339812.5
ns467760646
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
485979041
ns486582500
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
435101416.5
ns432022166.5
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
729625458
ns731293667
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
32277729
ns32285579
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
644292937.5
ns635989438
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
675559041.5
ns672695645.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
576973396
ns574471979.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
726825250
ns732788375
ns0.99
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s)
1313500
ns1223604
ns1.07
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s)
757354.5
ns730458.5
ns1.04
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s)
902583
ns937229
ns0.96
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s)
1989917
ns2093708
ns0.95
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA
572494
ns576262
ns0.99
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s)
2956604.5
ns2962583
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s)
2531750
ns2501167
ns1.01
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s)
2470208
ns2629000
ns0.94
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s)
3689875
ns3697500
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA
1358421.5
ns1333074.5
ns1.02
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s)
6625021
ns6827375
ns0.97
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s)
6484250
ns6481666.5
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s)
6454333
ns6494979.5
ns0.99
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s)
4442292
ns4456292
ns1.00
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s)
102875
ns103500
ns0.99
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s)
104104
ns103895.5
ns1.00
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s)
103209
ns104750
ns0.99
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s)
104917
ns103438
ns1.01
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA
28741
ns28118
ns1.02
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s)
237083
ns236334
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s)
237542
ns237208
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s)
236500
ns236958
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s)
250125
ns249583
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA
220363.5
ns218894
ns1.01
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s)
330167
ns742125
ns0.44
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s)
744959
ns754375
ns0.99
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s)
741959
ns742375
ns1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s)
722000
ns733396
ns0.98
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s)
13250
ns13584
ns0.98
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s)
13416.5
ns13458
ns1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s)
13875
ns14417
ns0.96
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s)
13750
ns13500
ns1.02
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA
28904
ns28346
ns1.02
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s)
25625
ns25937.5
ns0.99
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s)
25833
ns25812.5
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s)
25750
ns26167
ns0.98
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s)
25833
ns25541.5
ns1.01
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA
212085
ns208547.5
ns1.02
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s)
45645.5
ns45562
ns1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s)
46208
ns46000
ns1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s)
46208
ns46500
ns0.99
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s)
26875
ns27041.5
ns0.99
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s)
305618500
ns306135083.5
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s)
277901833
ns279280917
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s)
188750583
ns187541479
ns1.01
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s)
309520834
ns311366041
ns0.99
vgg16(32, 32, 3, 64)/forward/GPU/CUDA
7623749.5
ns7673100.5
ns0.99
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s)
1091097062.5
ns1102399979.5
ns0.99
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s)
1062816167
ns1066256459
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s)
898810583
ns890231458
ns1.01
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s)
1292104708
ns1297546000
ns1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA
27087435
ns27302775.5
ns0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s)
417084
ns416791.5
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s)
419167
ns413292
ns1.01
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s)
415667
ns416667
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s)
431375
ns414084
ns1.04
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA
48303
ns48087
ns1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s)
1452104.5
ns1365729
ns1.06
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s)
1275562
ns1233625
ns1.03
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s)
1264708.5
ns1273021
ns0.99
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s)
1725333
ns1719854.5
ns1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA
227769
ns225988.5
ns1.01
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s)
1850667
ns3499416
ns0.53
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s)
3421562.5
ns3462146
ns0.99
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s)
3399499.5
ns3426187.5
ns0.99
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s)
3662312.5
ns3641166.5
ns1.01
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s)
1486791
ns1471125
ns1.01
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s)
911292
ns940083
ns0.97
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s)
1056417
ns1055666
ns1.00
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s)
2195084
ns2211917
ns0.99
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA
580986.5
ns580807.5
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s)
3080583
ns3085291
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s)
2660084
ns2648667
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s)
2573166
ns2683771
ns0.96
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s)
3820687
ns3833875
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA
1362672
ns1348747
ns1.01
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s)
8819292
ns8817354.5
ns1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s)
8745333
ns8751792
ns1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s)
8773625
ns9138166.5
ns0.96
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s)
6434292
ns6346145.5
ns1.01
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s)
2542
ns2167
ns1.17
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s)
2792
ns2584
ns1.08
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s)
2937.5
ns3083
ns0.95
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s)
2375
ns2500
ns0.95
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA
25652
ns25068
ns1.02
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s)
7041
ns7333
ns0.96
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s)
7208
ns7292
ns0.99
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s)
7209
ns7375
ns0.98
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s)
7125
ns6916.5
ns1.03
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA
192398.5
ns189664
ns1.01
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s)
8583
ns8541
ns1.00
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s)
8667
ns8708
ns1.00
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s)
8542
ns8542
ns1
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s)
6000
ns5708
ns1.05
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s)
13375
ns13166
ns1.02
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s)
13417
ns13792
ns0.97
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s)
14084
ns14750
ns0.95
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s)
13583
ns13562.5
ns1.00
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA
25588
ns25030
ns1.02
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s)
29125
ns29375
ns0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s)
29209
ns29000
ns1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s)
29292
ns29167
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s)
29187.5
ns29000
ns1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA
200255.5
ns199600
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s)
42959
ns93042
ns0.46
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s)
93000
ns94458
ns0.98
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s)
92959
ns93125
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s)
91250
ns91166
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s)
28333
ns28291
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s)
27979.5
ns27666.5
ns1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s)
28291
ns28417
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s)
45916.5
ns48583
ns0.95
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA
27099
ns26505
ns1.02
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s)
44375
ns43792
ns1.01
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s)
45208
ns48666
ns0.93
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s)
43708
ns44125
ns0.99
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s)
63187
ns63417
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA
172626.5
ns171477
ns1.01
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s)
68833
ns68333
ns1.01
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s)
69041
ns69000
ns1.00
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s)
68416
ns67958
ns1.01
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s)
68542
ns68500
ns1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s)
1916
ns1792
ns1.07
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s)
1875
ns1875
ns1
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s)
2333
ns2167
ns1.08
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s)
1625
ns1875
ns0.87
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA
24015
ns23390
ns1.03
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s)
5042
ns5250
ns0.96
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s)
5291
ns5250
ns1.01
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s)
5416
ns5291
ns1.02
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s)
5542
ns5166
ns1.07
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA
176394.5
ns175020
ns1.01
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s)
8208
ns7958
ns1.03
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s)
8187.5
ns8250
ns0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s)
8125
ns8250
ns0.98
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s)
5625
ns5584
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
106620958
ns106844041
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
125627166
ns126822625.5
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
120144521
ns121529708.5
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
117625187.5
ns118057875
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
2655445
ns2630102
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
389249875
ns389182625
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
378229083
ns453684062.5
ns0.83
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
354732875
ns353315042
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
489409292
ns481224125
ns1.02
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
15161397.5
ns15198787
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
618241875
ns613025667
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
859950833
ns864422958
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
803956646
ns626775958
ns1.28
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
914357292
ns915632125
ns1.00
This comment was automatically generated by workflow using github-action-benchmark.