Skip to content

Commit

Permalink
docs: fix images.jl link (#997)
Browse files Browse the repository at this point in the history
  • Loading branch information
NeroBlackstone authored Oct 30, 2024
1 parent 4379ec3 commit cb92a56
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/src/ecosystem.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ const dataload = [
name: 'Images.jl',
desc: 'An image library for Julia',
links: [
{ icon: 'github', link: 'ttps://github.com/JuliaImages/Images.jl' }
{ icon: 'github', link: 'https://github.com/JuliaImages/Images.jl' }
]
},
{
Expand Down

1 comment on commit cb92a56

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lux Benchmarks

Benchmark suite Current: cb92a56 Previous: 4379ec3 Ratio
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s) 412541 ns 411750 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s) 242250 ns 241583 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s) 322416.5 ns 322167 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s) 740041 ns 740459 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA 43783 ns 44353 ns 0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s) 641833 ns 655917 ns 0.98
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s) 443458 ns 464833 ns 0.95
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s) 478625 ns 468833 ns 1.02
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s) 958167 ns 953354.5 ns 1.01
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA 190648 ns 191177 ns 1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s) 744500 ns 762708 ns 0.98
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s) 516875 ns 569834 ns 0.91
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s) 622959 ns 633417 ns 0.98
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s) 971917 ns 954041 ns 1.02
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1626709 ns 1594000 ns 1.02
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1164083 ns 1160208 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1354209 ns 1349792 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 2382500 ns 2338000 ns 1.02
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA 212090 ns 213116.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12241875 ns 12315208 ns 0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 9584437.5 ns 9597708 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9294250 ns 9293542 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 18002396 ns 17930125 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1909620 ns 1907223.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17364333 ns 17341834 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 14442750 ns 14420792 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14311833 ns 14304834 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 21053667 ns 21034709 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 119842562.5 ns 120916625 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 182159229.5 ns 182214542 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 147780729 ns 148302000 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 104816708 ns 108170625 ns 0.97
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5472644 ns 5472288 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 592166875.5 ns 591646750.5 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 563821542 ns 563684334 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 442430104 ns 441224584 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 625737792 ns 624792917 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 34972882 ns 34975276 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 713130770.5 ns 711934541.5 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 691544250 ns 694318791 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 603916250 ns 615736125 ns 0.98
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 742687041 ns 746029625 ns 1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s) 870167 ns 864625 ns 1.01
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s) 801000.5 ns 801041.5 ns 1.00
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s) 1220750 ns 1219979.5 ns 1.00
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s) 949688 ns 954250 ns 1.00
lenet(28, 28, 1, 32)/forward/GPU/CUDA 265193.5 ns 271063.5 ns 0.98
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s) 2750250 ns 2719646 ns 1.01
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s) 2457125 ns 2462708.5 ns 1.00
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s) 3329208 ns 3306125 ns 1.01
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s) 3289292 ns 3385271 ns 0.97
lenet(28, 28, 1, 32)/zygote/GPU/CUDA 1041014 ns 1061902 ns 0.98
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 6810916 ns 6794666 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 6350125 ns 6363417 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 6502792 ns 6537167 ns 0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 7507833.5 ns 7529437.5 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 210394 ns 211586 ns 0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 24022917 ns 23988417 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 21319833 ns 21318917 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 21303250 ns 21539417 ns 0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 29727625 ns 29676416.5 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1967511.5 ns 1981095 ns 0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 37215750 ns 37358667 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 45637834 ns 45576125 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 45715979 ns 34606625 ns 1.32
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 49236354 ns 49443917 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 13381562.5 ns 13334188 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 12433834 ns 12465000 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 12537792 ns 12598625 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 15142500 ns 15188833 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 513457.5 ns 512594 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 47126417 ns 47191979 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 41878854 ns 41899021 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 40616375 ns 40856541 ns 0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 58178583 ns 58133083 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 3235162 ns 3233025.5 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 74629396 ns 75271104 ns 0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 91798958 ns 91865084 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 91292209 ns 68909500 ns 1.32
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 98432125 ns 98569708 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 287875563 ns 285118604 ns 1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 347603000 ns 347578167 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 314078500 ns 315462416 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 269776750 ns 275551125 ns 0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 7105513 ns 7112475.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 971474250 ns 973431500 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 943762542 ns 941010291 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 823005791 ns 826121208 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 1118023937.5 ns 1128469333.5 ns 0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 33868329.5 ns 33864812.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1427625104.5 ns 1435266167 ns 0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 1702832375 ns 1709541500 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1637231875 ns 1266814792 ns 1.29
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 1670864792 ns 1672710667 ns 1.00
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s) 1542584 ns 1549875 ns 1.00
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s) 1241333 ns 1256625.5 ns 0.99
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s) 1613625 ns 1620708 ns 1.00
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s) 2155812.5 ns 2159125 ns 1.00
lenet(28, 28, 1, 128)/forward/GPU/CUDA 272755 ns 276844 ns 0.99
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s) 7887667 ns 7894542 ns 1.00
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s) 6453729 ns 6659583.5 ns 0.97
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s) 7173500 ns 7112458 ns 1.01
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s) 10450125 ns 10466625 ns 1.00
lenet(28, 28, 1, 128)/zygote/GPU/CUDA 1114904 ns 1131295 ns 0.99
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s) 177697708 ns 178664375 ns 0.99
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s) 183127729.5 ns 183079667 ns 1.00
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s) 108153771 ns 110239854.5 ns 0.98
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s) 165745583 ns 165902562.5 ns 1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA 4846033.5 ns 4850474.5 ns 1.00
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s) 638012042 ns 637338916 ns 1.00
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s) 679029417 ns 688777459 ns 0.99
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s) 519025667 ns 453913416 ns 1.14
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s) 643337167 ns 656584541 ns 0.98
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA 16351975 ns 16410189 ns 1.00
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s) 1081646 ns 1076312.5 ns 1.00
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s) 954500 ns 957271 ns 1.00
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s) 1344104 ns 1343625 ns 1.00
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s) 1347771 ns 1344604 ns 1.00
lenet(28, 28, 1, 64)/forward/GPU/CUDA 275416 ns 279240 ns 0.99
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s) 5770208 ns 6007729 ns 0.96
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s) 4655458.5 ns 4675125 ns 1.00
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s) 4980792 ns 4946791 ns 1.01
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s) 5735812 ns 5677084 ns 1.01
lenet(28, 28, 1, 64)/zygote/GPU/CUDA 1152086 ns 1154307 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 23668250 ns 23587458 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 43049541.5 ns 44837937.5 ns 0.96
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 37347521 ns 37828166 ns 0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 34921604 ns 34890312 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1832427 ns 1835859 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 183460916.5 ns 184849458 ns 0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 172265542 ns 173020292 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 144281125 ns 145743417 ns 0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 390189959 ns 391585708 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 16494935.5 ns 16488184.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 284961708 ns 284020041 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 258063041 ns 257804083.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 285191292 ns 289035959 ns 0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 439855709 ns 440921375 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s) 619813333.5 ns 623252354.5 ns 0.99
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s) 578294708 ns 578021459 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s) 376301604.5 ns 377542125 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s) 655157813 ns 659480437.5 ns 0.99
vgg16(32, 32, 3, 128)/forward/GPU/CUDA 12474713 ns 12471152 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s) 1799700979 ns 1819111104.5 ns 0.99
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s) 1657435625 ns 1660737208 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s) 1521911875 ns 1556316104 ns 0.98
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s) 2098240625 ns 2162942771 ns 0.97
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA 49823657 ns 49790206 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3073437.5 ns 3047666 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2095291 ns 2113395.5 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2281125 ns 2276187.5 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 4821625 ns 4615062.5 ns 1.04
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 585401 ns 580103 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 25431958 ns 25551000 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 20342750 ns 20372104.5 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 18922583 ns 18984250 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 36574542 ns 36475396 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 3196535 ns 3197534 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 35041958.5 ns 35362875 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 28788125 ns 28780458 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 29576167 ns 29749292 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 42034167 ns 42359958 ns 0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1646125 ns 1644375 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1175250 ns 1184292 ns 0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1363396 ns 1380959 ns 0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 2504083 ns 2490125 ns 1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 216709 ns 217958.5 ns 0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12715000.5 ns 12687000 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 9998250 ns 10006167 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9683354 ns 9643084 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 18453354 ns 18392479 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1955756 ns 1945166.5 ns 1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17696667 ns 17715708 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 14806937.5 ns 14807375 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14557708 ns 14584104 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 21432729.5 ns 21450895.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 23752041 ns 23273292 ns 1.02
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 43099166 ns 43934833 ns 0.98
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 37397812.5 ns 37907334 ns 0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 34904958.5 ns 34857583 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1842817 ns 1854216 ns 0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 190852833 ns 189714750 ns 1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 251191084 ns 252758813 ns 0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 193659750 ns 196003395.5 ns 0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 429893688 ns 431014896 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 13924800 ns 13876633.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 289369042 ns 288996271 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 265637979 ns 265619583 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 292122354 ns 294970666.5 ns 0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 445323208 ns 447715041 ns 0.99
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s) 3394917 ns 3400249.5 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s) 2913791 ns 2883458 ns 1.01
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s) 3035709 ns 3083459 ns 0.98
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s) 4098958 ns 4098667 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA 578446 ns 585962 ns 0.99
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s) 7619333 ns 7635041 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s) 7367750 ns 7317312.5 ns 1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s) 7464166.5 ns 7452208.5 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s) 8211250 ns 8215479 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA 1384858.5 ns 1410316 ns 0.98
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s) 13690021 ns 18791292 ns 0.73
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s) 19212042 ns 19172041.5 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s) 19131458 ns 19131167 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s) 15652916 ns 10737041.5 ns 1.46
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s) 69062.5 ns 68687.5 ns 1.01
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s) 67604 ns 67375 ns 1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s) 70458 ns 70250 ns 1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s) 69562 ns 68250 ns 1.02
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA 48441 ns 49195 ns 0.98
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s) 324458 ns 323229 ns 1.00
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s) 326292 ns 332709 ns 0.98
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s) 236625 ns 316083 ns 0.75
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s) 377708 ns 318042 ns 1.19
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA 214194.5 ns 218315 ns 0.98
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s) 424083.5 ns 444708 ns 0.95
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s) 458041.5 ns 400125 ns 1.14
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s) 356041 ns 414833 ns 0.86
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s) 375854.5 ns 356271 ns 1.05
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3032834 ns 3032708 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2078062.5 ns 2089166.5 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2268541 ns 2260084 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 4511375 ns 4570896 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA 583753.5 ns 585123 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 23595458 ns 23581083 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 18331416 ns 18324312.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 16965625 ns 16907208 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 35767042 ns 36054896 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 3121440 ns 3100409.5 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 33311458 ns 33410625 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 28023083 ns 27996292 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 27412334 ns 27450166 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 41849604 ns 41964333 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 121058542 ns 118848041.5 ns 1.02
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 181255520.5 ns 181989437.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 147913792 ns 147982042 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 108516083 ns 103352208 ns 1.05
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5463863.5 ns 5461107 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 469339812.5 ns 467760646 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 485979041 ns 486582500 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 435101416.5 ns 432022166.5 ns 1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 729625458 ns 731293667 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 32277729 ns 32285579 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 644292937.5 ns 635989438 ns 1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 675559041.5 ns 672695645.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 576973396 ns 574471979.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 726825250 ns 732788375 ns 0.99
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s) 1313500 ns 1223604 ns 1.07
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s) 757354.5 ns 730458.5 ns 1.04
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s) 902583 ns 937229 ns 0.96
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s) 1989917 ns 2093708 ns 0.95
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA 572494 ns 576262 ns 0.99
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s) 2956604.5 ns 2962583 ns 1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s) 2531750 ns 2501167 ns 1.01
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s) 2470208 ns 2629000 ns 0.94
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s) 3689875 ns 3697500 ns 1.00
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA 1358421.5 ns 1333074.5 ns 1.02
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s) 6625021 ns 6827375 ns 0.97
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s) 6484250 ns 6481666.5 ns 1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s) 6454333 ns 6494979.5 ns 0.99
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s) 4442292 ns 4456292 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s) 102875 ns 103500 ns 0.99
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s) 104104 ns 103895.5 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s) 103209 ns 104750 ns 0.99
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s) 104917 ns 103438 ns 1.01
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA 28741 ns 28118 ns 1.02
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s) 237083 ns 236334 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s) 237542 ns 237208 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s) 236500 ns 236958 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s) 250125 ns 249583 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA 220363.5 ns 218894 ns 1.01
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s) 330167 ns 742125 ns 0.44
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s) 744959 ns 754375 ns 0.99
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s) 741959 ns 742375 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s) 722000 ns 733396 ns 0.98
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s) 13250 ns 13584 ns 0.98
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s) 13416.5 ns 13458 ns 1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s) 13875 ns 14417 ns 0.96
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s) 13750 ns 13500 ns 1.02
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA 28904 ns 28346 ns 1.02
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s) 25625 ns 25937.5 ns 0.99
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s) 25833 ns 25812.5 ns 1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s) 25750 ns 26167 ns 0.98
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s) 25833 ns 25541.5 ns 1.01
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA 212085 ns 208547.5 ns 1.02
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s) 45645.5 ns 45562 ns 1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s) 46208 ns 46000 ns 1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s) 46208 ns 46500 ns 0.99
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s) 26875 ns 27041.5 ns 0.99
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s) 305618500 ns 306135083.5 ns 1.00
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s) 277901833 ns 279280917 ns 1.00
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s) 188750583 ns 187541479 ns 1.01
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s) 309520834 ns 311366041 ns 0.99
vgg16(32, 32, 3, 64)/forward/GPU/CUDA 7623749.5 ns 7673100.5 ns 0.99
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s) 1091097062.5 ns 1102399979.5 ns 0.99
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s) 1062816167 ns 1066256459 ns 1.00
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s) 898810583 ns 890231458 ns 1.01
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s) 1292104708 ns 1297546000 ns 1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA 27087435 ns 27302775.5 ns 0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s) 417084 ns 416791.5 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s) 419167 ns 413292 ns 1.01
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s) 415667 ns 416667 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s) 431375 ns 414084 ns 1.04
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA 48303 ns 48087 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s) 1452104.5 ns 1365729 ns 1.06
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s) 1275562 ns 1233625 ns 1.03
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s) 1264708.5 ns 1273021 ns 0.99
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s) 1725333 ns 1719854.5 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA 227769 ns 225988.5 ns 1.01
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s) 1850667 ns 3499416 ns 0.53
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s) 3421562.5 ns 3462146 ns 0.99
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s) 3399499.5 ns 3426187.5 ns 0.99
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s) 3662312.5 ns 3641166.5 ns 1.01
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s) 1486791 ns 1471125 ns 1.01
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s) 911292 ns 940083 ns 0.97
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s) 1056417 ns 1055666 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s) 2195084 ns 2211917 ns 0.99
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA 580986.5 ns 580807.5 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s) 3080583 ns 3085291 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s) 2660084 ns 2648667 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s) 2573166 ns 2683771 ns 0.96
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s) 3820687 ns 3833875 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA 1362672 ns 1348747 ns 1.01
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s) 8819292 ns 8817354.5 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s) 8745333 ns 8751792 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s) 8773625 ns 9138166.5 ns 0.96
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s) 6434292 ns 6346145.5 ns 1.01
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s) 2542 ns 2167 ns 1.17
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s) 2792 ns 2584 ns 1.08
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s) 2937.5 ns 3083 ns 0.95
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s) 2375 ns 2500 ns 0.95
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA 25652 ns 25068 ns 1.02
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s) 7041 ns 7333 ns 0.96
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s) 7208 ns 7292 ns 0.99
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s) 7209 ns 7375 ns 0.98
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s) 7125 ns 6916.5 ns 1.03
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA 192398.5 ns 189664 ns 1.01
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s) 8583 ns 8541 ns 1.00
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s) 8667 ns 8708 ns 1.00
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s) 8542 ns 8542 ns 1
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s) 6000 ns 5708 ns 1.05
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s) 13375 ns 13166 ns 1.02
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s) 13417 ns 13792 ns 0.97
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s) 14084 ns 14750 ns 0.95
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s) 13583 ns 13562.5 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA 25588 ns 25030 ns 1.02
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s) 29125 ns 29375 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s) 29209 ns 29000 ns 1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s) 29292 ns 29167 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s) 29187.5 ns 29000 ns 1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA 200255.5 ns 199600 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s) 42959 ns 93042 ns 0.46
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s) 93000 ns 94458 ns 0.98
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s) 92959 ns 93125 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s) 91250 ns 91166 ns 1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s) 28333 ns 28291 ns 1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s) 27979.5 ns 27666.5 ns 1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s) 28291 ns 28417 ns 1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s) 45916.5 ns 48583 ns 0.95
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA 27099 ns 26505 ns 1.02
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s) 44375 ns 43792 ns 1.01
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s) 45208 ns 48666 ns 0.93
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s) 43708 ns 44125 ns 0.99
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s) 63187 ns 63417 ns 1.00
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA 172626.5 ns 171477 ns 1.01
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s) 68833 ns 68333 ns 1.01
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s) 69041 ns 69000 ns 1.00
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s) 68416 ns 67958 ns 1.01
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s) 68542 ns 68500 ns 1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s) 1916 ns 1792 ns 1.07
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s) 1875 ns 1875 ns 1
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s) 2333 ns 2167 ns 1.08
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s) 1625 ns 1875 ns 0.87
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA 24015 ns 23390 ns 1.03
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s) 5042 ns 5250 ns 0.96
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s) 5291 ns 5250 ns 1.01
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s) 5416 ns 5291 ns 1.02
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s) 5542 ns 5166 ns 1.07
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA 176394.5 ns 175020 ns 1.01
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s) 8208 ns 7958 ns 1.03
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s) 8187.5 ns 8250 ns 0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s) 8125 ns 8250 ns 0.98
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s) 5625 ns 5584 ns 1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 106620958 ns 106844041 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 125627166 ns 126822625.5 ns 0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 120144521 ns 121529708.5 ns 0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 117625187.5 ns 118057875 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 2655445 ns 2630102 ns 1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 389249875 ns 389182625 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 378229083 ns 453684062.5 ns 0.83
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 354732875 ns 353315042 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 489409292 ns 481224125 ns 1.02
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 15161397.5 ns 15198787 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 618241875 ns 613025667 ns 1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 859950833 ns 864422958 ns 0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 803956646 ns 626775958 ns 1.28
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 914357292 ns 915632125 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Please sign in to comment.