Skip to content

Commit

Permalink
docs: fix broken link in Recurrence docs (#1001)
Browse files Browse the repository at this point in the history
  • Loading branch information
MartinuzziFrancesco authored Nov 2, 2024
1 parent cb92a56 commit 699c8d8
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/layers/recurrent.jl
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ automatically operate over a sequence of inputs.
!!! tip
Frameworks like Tensorflow have special implementation of
[`MultiRNNCell`](https://www.tensorflow.org/api_docs/python/tf/compat/v1/nn/rnn_cell/MultiRNNCell)
[`StackedRNNCells`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/StackedRNNCells)
to handle sequentially composed RNN Cells. In Lux, one can simple stack multiple
`Recurrence` blocks in a `Chain` to achieve the same.
Expand Down

1 comment on commit 699c8d8

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lux Benchmarks

Benchmark suite Current: 699c8d8 Previous: cb92a56 Ratio
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s) 411375 ns 412541 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s) 323084 ns 242250 ns 1.33
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s) 241750 ns 322416.5 ns 0.75
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s) 742584 ns 740041 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA 43670.5 ns 43783 ns 1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s) 638083 ns 641833 ns 0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s) 521459 ns 443458 ns 1.18
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s) 403792 ns 478625 ns 0.84
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s) 908000 ns 958167 ns 0.95
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA 188991 ns 190648 ns 0.99
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s) 744083.5 ns 744500 ns 1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s) 624667 ns 516875 ns 1.21
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s) 521562.5 ns 622959 ns 0.84
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s) 1006750 ns 971917 ns 1.04
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1618667 ns 1626709 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1189854.5 ns 1164083 ns 1.02
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1358375 ns 1354209 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 2360458 ns 2382500 ns 0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA 211422.5 ns 212090 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12284958.5 ns 12241875 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 9550979.5 ns 9584437.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9390791 ns 9294250 ns 1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 18060041.5 ns 18002396 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1906624.5 ns 1909620 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17280916 ns 17364333 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 14329167 ns 14442750 ns 0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14463083 ns 14311833 ns 1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 21088375 ns 21053667 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 121038500 ns 119842562.5 ns 1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 174268209 ns 182159229.5 ns 0.96
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 155647417 ns 147780729 ns 1.05
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 103289458 ns 104816708 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5459016 ns 5472644 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 592681937.5 ns 592166875.5 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 540116125 ns 563821542 ns 0.96
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 460022146 ns 442430104 ns 1.04
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 623412250 ns 625737792 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 38146652 ns 34972882 ns 1.09
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 751859749.5 ns 713130770.5 ns 1.05
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 667614542 ns 691544250 ns 0.97
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 606980437.5 ns 603916250 ns 1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 744028250 ns 742687041 ns 1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s) 861145.5 ns 870167 ns 0.99
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s) 826334 ns 801000.5 ns 1.03
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s) 1164604.5 ns 1220750 ns 0.95
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s) 959395.5 ns 949688 ns 1.01
lenet(28, 28, 1, 32)/forward/GPU/CUDA 263975.5 ns 265193.5 ns 1.00
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s) 2730708 ns 2750250 ns 0.99
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s) 2455708.5 ns 2457125 ns 1.00
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s) 3317604.5 ns 3329208 ns 1.00
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s) 3286521.5 ns 3289292 ns 1.00
lenet(28, 28, 1, 32)/zygote/GPU/CUDA 1038213 ns 1041014 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 6779291.5 ns 6810916 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 6365500 ns 6350125 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 6531583 ns 6502792 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 7635875 ns 7507833.5 ns 1.02
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 210025 ns 210394 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 24055375 ns 24022917 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 21237625 ns 21319833 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 21535792 ns 21303250 ns 1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 29721771 ns 29727625 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1973993 ns 1967511.5 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 37426416 ns 37215750 ns 1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 34385895.5 ns 45637834 ns 0.75
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 45888792 ns 45715979 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 49367041.5 ns 49236354 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 13355875 ns 13381562.5 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 12430958.5 ns 12433834 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 12600937.5 ns 12537792 ns 1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 15122729 ns 15142500 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 518849 ns 513457.5 ns 1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 47134500 ns 47126417 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 41671875 ns 41878854 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 41125499.5 ns 40616375 ns 1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 58336333 ns 58178583 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 3218047 ns 3235162 ns 0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 74376750 ns 74629396 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 68965000 ns 91798958 ns 0.75
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 91496292 ns 91292209 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 98399104 ns 98432125 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 286107083.5 ns 287875563 ns 0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 339607208 ns 347603000 ns 0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 321183396 ns 314078500 ns 1.02
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 268796333 ns 269776750 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 7107764 ns 7105513 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 971792250 ns 971474250 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 922480542 ns 943762542 ns 0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 835684104 ns 823005791 ns 1.02
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 1117474583 ns 1118023937.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 33742759 ns 33868329.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1448964667 ns 1427625104.5 ns 1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 1371326875 ns 1702832375 ns 0.81
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1656412041 ns 1637231875 ns 1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 1663889000 ns 1670864792 ns 1.00
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s) 1528208 ns 1542584 ns 0.99
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s) 1277937.5 ns 1241333 ns 1.03
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s) 1635937.5 ns 1613625 ns 1.01
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s) 2136917 ns 2155812.5 ns 0.99
lenet(28, 28, 1, 128)/forward/GPU/CUDA 277390.5 ns 272755 ns 1.02
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s) 7872250 ns 7887667 ns 1.00
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s) 6588000 ns 6453729 ns 1.02
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s) 7229396.5 ns 7173500 ns 1.01
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s) 10478041 ns 10450125 ns 1.00
lenet(28, 28, 1, 128)/zygote/GPU/CUDA 1130644 ns 1114904 ns 1.01
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s) 177405459 ns 177697708 ns 1.00
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s) 132546709 ns 183127729.5 ns 0.72
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s) 130053917 ns 108153771 ns 1.20
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s) 165568083 ns 165745583 ns 1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA 4878153.5 ns 4846033.5 ns 1.01
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s) 643663333 ns 638012042 ns 1.01
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s) 496969000 ns 679029417 ns 0.73
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s) 558568375 ns 519025667 ns 1.08
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s) 654929750 ns 643337167 ns 1.02
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA 18110009 ns 16351975 ns 1.11
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s) 1068292 ns 1081646 ns 0.99
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s) 983291 ns 954500 ns 1.03
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s) 1327542 ns 1344104 ns 0.99
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s) 1373792 ns 1347771 ns 1.02
lenet(28, 28, 1, 64)/forward/GPU/CUDA 281111 ns 275416 ns 1.02
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s) 6002271 ns 5770208 ns 1.04
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s) 4660958.5 ns 4655458.5 ns 1.00
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s) 5006354 ns 4980792 ns 1.01
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s) 5624708 ns 5735812 ns 0.98
lenet(28, 28, 1, 64)/zygote/GPU/CUDA 1151478.5 ns 1152086 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 23602937.5 ns 23668250 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 34462041.5 ns 43049541.5 ns 0.80
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 41206708 ns 37347521 ns 1.10
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 34998812.5 ns 34921604 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1861561 ns 1832427 ns 1.02
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 184955020.5 ns 183460916.5 ns 1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 159249771 ns 172265542 ns 0.92
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 150499917 ns 144281125 ns 1.04
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 390550250 ns 390189959 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 16472871 ns 16494935.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 286689500 ns 284961708 ns 1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 244388646 ns 258063041 ns 0.95
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 296120917 ns 285191292 ns 1.04
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 440533417 ns 439855709 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s) 624998521 ns 619813333.5 ns 1.01
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s) 477642917 ns 578294708 ns 0.83
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s) 411867812.5 ns 376301604.5 ns 1.09
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s) 656030104 ns 655157813 ns 1.00
vgg16(32, 32, 3, 128)/forward/GPU/CUDA 12477905 ns 12474713 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s) 1873735437.5 ns 1799700979 ns 1.04
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s) 1636021583 ns 1657435625 ns 0.99
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s) 1558895000 ns 1521911875 ns 1.02
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s) 2103890062.5 ns 2098240625 ns 1.00
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA 49609571 ns 49823657 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3064313 ns 3073437.5 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2106875 ns 2095291 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2301542 ns 2281125 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 4944708.5 ns 4821625 ns 1.03
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 586671 ns 585401 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 25694166 ns 25431958 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 20092625.5 ns 20342750 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 19545895.5 ns 18922583 ns 1.03
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 36568812 ns 36574542 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 3200820 ns 3196535 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 35138250 ns 35041958.5 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 28420084 ns 28788125 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 30280062.5 ns 29576167 ns 1.02
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 42544854.5 ns 42034167 ns 1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1650167 ns 1646125 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1195708 ns 1175250 ns 1.02
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1388458 ns 1363396 ns 1.02
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 2498125 ns 2504083 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 218867 ns 216709 ns 1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12700771 ns 12715000.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 9962124.5 ns 9998250 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9800459 ns 9683354 ns 1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 18403354 ns 18453354 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1957280 ns 1955756 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17702708 ns 17696667 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 14737000 ns 14806937.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14865041 ns 14557708 ns 1.02
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 21477333.5 ns 21432729.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 23644021 ns 23752041 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 34568146 ns 43099166 ns 0.80
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 41693959 ns 37397812.5 ns 1.11
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 34878583 ns 34904958.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1840287 ns 1842817 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 188357375 ns 190852833 ns 0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 233488333 ns 251191084 ns 0.93
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 202742250 ns 193659750 ns 1.05
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 429823895.5 ns 429893688 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 13939550 ns 13924800 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 291377187.5 ns 289369042 ns 1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 249397167 ns 265637979 ns 0.94
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 300701042 ns 292122354 ns 1.03
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 446062833 ns 445323208 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s) 3387083 ns 3394917 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s) 3112854 ns 2913791 ns 1.07
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s) 2905708 ns 3035709 ns 0.96
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s) 3940000 ns 4098958 ns 0.96
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA 570283 ns 578446 ns 0.99
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s) 7636021 ns 7619333 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s) 7442000 ns 7367750 ns 1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s) 7380521 ns 7464166.5 ns 0.99
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s) 8212750 ns 8211250 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA 1364212 ns 1384858.5 ns 0.99
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s) 13685833.5 ns 13690021 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s) 19094334 ns 19212042 ns 0.99
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s) 19126041 ns 19131458 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s) 15649500.5 ns 15652916 ns 1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s) 69459 ns 69062.5 ns 1.01
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s) 69875 ns 67604 ns 1.03
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s) 72083 ns 70458 ns 1.02
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s) 68812.5 ns 69562 ns 0.99
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA 47850 ns 48441 ns 0.99
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s) 318833.5 ns 324458 ns 0.98
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s) 285875.5 ns 326292 ns 0.88
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s) 326000 ns 236625 ns 1.38
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s) 319625 ns 377708 ns 0.85
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA 210144 ns 214194.5 ns 0.98
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s) 447500 ns 424083.5 ns 1.06
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s) 437791 ns 458041.5 ns 0.96
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s) 413375 ns 356041 ns 1.16
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s) 328959 ns 375854.5 ns 0.88
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3055292 ns 3032834 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2092833 ns 2078062.5 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2283687.5 ns 2268541 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 4895416.5 ns 4511375 ns 1.09
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA 585359 ns 583753.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 23561833 ns 23595458 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 18085229 ns 18331416 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 18562458 ns 16965625 ns 1.09
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 35017833 ns 35767042 ns 0.98
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 3105298.5 ns 3121440 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 33378229 ns 33311458 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 27662145.5 ns 28023083 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 27887458 ns 27412334 ns 1.02
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 41809854.5 ns 41849604 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 120765334 ns 121058542 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 174275666 ns 181255520.5 ns 0.96
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 156098417 ns 147913792 ns 1.06
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 103997770.5 ns 108516083 ns 0.96
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5461795.5 ns 5463863.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 471697125 ns 469339812.5 ns 1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 468205208 ns 485979041 ns 0.96
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 455789333 ns 435101416.5 ns 1.05
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 728998166 ns 729625458 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 35173763 ns 32277729 ns 1.09
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 640412562.5 ns 644292937.5 ns 0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 655505917 ns 675559041.5 ns 0.97
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 590476187.5 ns 576973396 ns 1.02
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 732032000 ns 726825250 ns 1.01
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s) 1249541 ns 1313500 ns 0.95
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s) 949958.5 ns 757354.5 ns 1.25
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s) 764125 ns 902583 ns 0.85
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s) 2000458 ns 1989917 ns 1.01
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA 568299.5 ns 572494 ns 0.99
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s) 2960792 ns 2956604.5 ns 1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s) 2611021 ns 2531750 ns 1.03
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s) 2513020.5 ns 2470208 ns 1.02
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s) 3690271 ns 3689875 ns 1.00
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA 1319857 ns 1358421.5 ns 0.97
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s) 6641791 ns 6625021 ns 1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s) 6504791 ns 6484250 ns 1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s) 6489375 ns 6454333 ns 1.01
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s) 4443166 ns 4442292 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s) 104249.5 ns 102875 ns 1.01
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s) 105166 ns 104104 ns 1.01
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s) 105250 ns 103209 ns 1.02
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s) 105625 ns 104917 ns 1.01
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA 28456 ns 28741 ns 0.99
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s) 236750 ns 237083 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s) 236541 ns 237542 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s) 237667 ns 236500 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s) 249625 ns 250125 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA 217310.5 ns 220363.5 ns 0.99
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s) 330167 ns 330167 ns 1
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s) 742062.5 ns 744959 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s) 748209 ns 741959 ns 1.01
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s) 721792 ns 722000 ns 1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s) 13583 ns 13250 ns 1.03
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s) 14250 ns 13416.5 ns 1.06
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s) 14354 ns 13875 ns 1.03
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s) 13791 ns 13750 ns 1.00
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA 28098 ns 28904 ns 0.97
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s) 25333.5 ns 25625 ns 0.99
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s) 25750 ns 25833 ns 1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s) 25667 ns 25750 ns 1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s) 25750 ns 25833 ns 1.00
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA 206637.5 ns 212085 ns 0.97
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s) 45583.5 ns 45645.5 ns 1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s) 45875 ns 46208 ns 0.99
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s) 46000 ns 46208 ns 1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s) 28209 ns 26875 ns 1.05
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s) 309099062.5 ns 305618500 ns 1.01
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s) 232469666.5 ns 277901833 ns 0.84
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s) 216377833 ns 188750583 ns 1.15
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s) 308762583 ns 309520834 ns 1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA 7672114 ns 7623749.5 ns 1.01
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s) 1103432604 ns 1091097062.5 ns 1.01
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s) 1001458208 ns 1062816167 ns 0.94
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s) 901919771 ns 898810583 ns 1.00
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s) 1293921625 ns 1292104708 ns 1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA 27115979 ns 27087435 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s) 414208.5 ns 417084 ns 0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s) 415583 ns 419167 ns 0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s) 416958 ns 415667 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s) 418375.5 ns 431375 ns 0.97
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA 48086 ns 48303 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s) 1344667 ns 1452104.5 ns 0.93
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s) 1315687 ns 1275562 ns 1.03
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s) 1294125 ns 1264708.5 ns 1.02
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s) 1745083.5 ns 1725333 ns 1.01
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA 221906 ns 227769 ns 0.97
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s) 1836104.5 ns 1850667 ns 0.99
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s) 3473770.5 ns 3421562.5 ns 1.02
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s) 3450771 ns 3399499.5 ns 1.02
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s) 3660083 ns 3662312.5 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s) 1396583.5 ns 1486791 ns 0.94
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s) 1097333 ns 911292 ns 1.20
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s) 939062.5 ns 1056417 ns 0.89
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s) 2231792 ns 2195084 ns 1.02
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA 574483.5 ns 580986.5 ns 0.99
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s) 2873417 ns 3080583 ns 0.93
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s) 2715208 ns 2660084 ns 1.02
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s) 2626645.5 ns 2573166 ns 1.02
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s) 3813542 ns 3820687 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA 1401203 ns 1362672 ns 1.03
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s) 8821895.5 ns 8819292 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s) 8770604 ns 8745333 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s) 8763666.5 ns 8773625 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s) 6350229.5 ns 6434292 ns 0.99
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s) 2250 ns 2542 ns 0.89
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s) 2583 ns 2792 ns 0.93
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s) 3333 ns 2937.5 ns 1.13
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s) 2583 ns 2375 ns 1.09
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA 24886 ns 25652 ns 0.97
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s) 7292 ns 7041 ns 1.04
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s) 7042 ns 7208 ns 0.98
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s) 7375 ns 7209 ns 1.02
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s) 6959 ns 7125 ns 0.98
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA 184871.5 ns 192398.5 ns 0.96
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s) 8479.5 ns 8583 ns 0.99
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s) 8667 ns 8667 ns 1
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s) 8625 ns 8542 ns 1.01
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s) 6000 ns 6000 ns 1
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s) 13291 ns 13375 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s) 13750 ns 13417 ns 1.02
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s) 14521 ns 14084 ns 1.03
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s) 13458 ns 13583 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA 25102 ns 25588 ns 0.98
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s) 29250 ns 29125 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s) 28959 ns 29209 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s) 29167 ns 29292 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s) 29208.5 ns 29187.5 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA 194866.5 ns 200255.5 ns 0.97
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s) 43333 ns 42959 ns 1.01
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s) 94750 ns 93000 ns 1.02
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s) 93687.5 ns 92959 ns 1.01
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s) 90834 ns 91250 ns 1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s) 27916 ns 28333 ns 0.99
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s) 28500 ns 27979.5 ns 1.02
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s) 27166 ns 28291 ns 0.96
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s) 46166 ns 45916.5 ns 1.01
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA 26285 ns 27099 ns 0.97
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s) 44541 ns 44375 ns 1.00
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s) 44250 ns 45208 ns 0.98
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s) 44666 ns 43708 ns 1.02
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s) 63625 ns 63187 ns 1.01
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA 167275 ns 172626.5 ns 0.97
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s) 68458 ns 68833 ns 0.99
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s) 68125 ns 69041 ns 0.99
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s) 68708 ns 68416 ns 1.00
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s) 68208 ns 68542 ns 1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s) 1834 ns 1916 ns 0.96
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s) 2042 ns 1875 ns 1.09
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s) 2250 ns 2333 ns 0.96
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s) 1958 ns 1625 ns 1.20
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA 23492 ns 24015 ns 0.98
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s) 5416 ns 5042 ns 1.07
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s) 5333 ns 5291 ns 1.01
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s) 5375 ns 5416 ns 0.99
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s) 5291.5 ns 5542 ns 0.95
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA 171557 ns 176394.5 ns 0.97
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s) 8312.5 ns 8208 ns 1.01
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s) 8250 ns 8187.5 ns 1.01
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s) 8208 ns 8125 ns 1.01
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s) 5667 ns 5625 ns 1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 106272125 ns 106620958 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 117220895.5 ns 125627166 ns 0.93
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 123891541 ns 120144521 ns 1.03
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 117462292 ns 117625187.5 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 2638590.5 ns 2655445 ns 0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 390984854 ns 389249875 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 370181584 ns 378229083 ns 0.98
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 344393625 ns 354732875 ns 0.97
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 481330584 ns 489409292 ns 0.98
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 15192721.5 ns 15161397.5 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 619409458 ns 618241875 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 668415479 ns 859950833 ns 0.78
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 816519375 ns 803956646 ns 1.02
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 916595917 ns 914357292 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Please sign in to comment.