Skip to content

Commit

Permalink
fix: compat entries for PINN2DPDE
Browse files Browse the repository at this point in the history
  • Loading branch information
avik-pal committed Sep 15, 2024
1 parent b257560 commit 5200f58
Showing 1 changed file with 12 additions and 0 deletions.
12 changes: 12 additions & 0 deletions examples/PINN2DPDE/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,16 @@ Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"

[compat]
ADTypes = "1.7.1"
CairoMakie = "0.12.10"
InteractiveUtils = "<0.0.1, 1"
Literate = "2"
Lux = "1"
LuxCUDA = "0.3.3"
MLUtils = "0.4.4"
OnlineStats = "1.7.1"
Optimisers = "0.3.3"
Printf = "1.10"
Random = "1.10"
Statistics = "1.10"
Zygote = "0.6.70"

1 comment on commit 5200f58

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lux Benchmarks

Benchmark suite Current: 5200f58 Previous: 3ca41c8 Ratio
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s) 411895.5 ns 412583 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s) 322166 ns 322979.5 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s) 322583 ns 243687.5 ns 1.32
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s) 741833.5 ns 738583 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA 43398 ns 43341 ns 1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s) 1330145.5 ns 1331541.5 ns 1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s) 2429292 ns 2407792 ns 1.01
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s) 14164208 ns 16439208.5 ns 0.86
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s) 2195542 ns 2194563 ns 1.00
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA 204689 ns 205021 ns 1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s) 1414708 ns 1426479.5 ns 0.99
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s) 903146 ns 895625 ns 1.01
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s) 1562333 ns 1543917 ns 1.01
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s) 2259500 ns 2206208.5 ns 1.02
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1774625 ns 1777646 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1098875 ns 1078167 ns 1.02
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1541229.5 ns 1530958 ns 1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 2958833 ns 3007208 ns 0.98
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA 207635.5 ns 207880 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12148354.5 ns 12178459 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 8822458 ns 8815750 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9188791 ns 9208125 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 18597916 ns 18565750 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1492767 ns 1492124 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17261541 ns 17280708 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 13975000 ns 13973458 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14512917 ns 14487354.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 21852791 ns 21838146 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 250272292 ns 249950041 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 148297917 ns 148180333 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 115932000 ns 116724083 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 447763458 ns 447231500 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5475849 ns 5449958 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1224211792 ns 1223396291 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 930629834 ns 929732416 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 831800229 ns 832528750 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 1632010250 ns 1633536917 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 31656181.5 ns 31232077 ns 1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1135741875 ns 1127648666 ns 1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 989475979.5 ns 1002243833.5 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1306310687.5 ns 1330111333.5 ns 0.98
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 1731093166.5 ns 1732141146 ns 1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s) 1096084 ns 1037166 ns 1.06
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s) 1621125 ns 1626521 ns 1.00
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s) 3736208 ns 3777708 ns 0.99
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s) 780896 ns 781583 ns 1.00
lenet(28, 28, 1, 32)/forward/GPU/CUDA 264547.5 ns 262013.5 ns 1.01
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s) 2990250 ns 3044041.5 ns 0.98
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s) 4101000 ns 4097042 ns 1.00
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s) 11120083 ns 11116208 ns 1.00
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s) 3243937 ns 3145292 ns 1.03
lenet(28, 28, 1, 32)/zygote/GPU/CUDA 1096407 ns 1092823.5 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 2316625 ns 2334666.5 ns 0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1427312.5 ns 1419459 ns 1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1669042 ns 1571666 ns 1.06
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 4212291 ns 4190125 ns 1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 208250.5 ns 208094.5 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 19418541 ns 19396000 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 16073458 ns 16089229 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 17186250 ns 17212895.5 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 25916083 ns 25854646 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1595514 ns 1594895 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 34080500.5 ns 34213875 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 30855979 ns 31031917 ns 0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 31220895.5 ns 31076583 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 36553708 ns 36708292 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 4532459 ns 4528125 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2782125 ns 2779625 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2915416 ns 2669750 ns 1.09
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 8380354 ns 8386375 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 423404 ns 420922 ns 1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 39032375 ns 38785750 ns 1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 32155750 ns 32129375 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 32313208 ns 32277375 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 51945292 ns 51825125 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2634739 ns 2627381 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 88517062.5 ns 88427125 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 114525875 ns 113463375 ns 1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 223799104.5 ns 227295542 ns 0.98
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 74994375 ns 74279083 ns 1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 268673709 ns 268165959 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 159281875 ns 158761895.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 126759937.5 ns 123751562.5 ns 1.02
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 485627875 ns 487404084 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 6994649 ns 6976350 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1470701416.5 ns 1472349770.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 1172207375 ns 1171105895.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 1063985125 ns 1066855854.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 2005181229 ns 2006883229.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 34667442 ns 34648809 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1715584583 ns 1717718833 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 1547782708 ns 1535202521 ns 1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1856138292 ns 1878248417 ns 0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 2207623729.5 ns 2205912250 ns 1.00
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s) 2030167 ns 2014792 ns 1.01
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s) 2979333 ns 3001875 ns 0.99
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s) 8195625 ns 8115875 ns 1.01
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s) 2492167 ns 2432666 ns 1.02
lenet(28, 28, 1, 128)/forward/GPU/CUDA 275800.5 ns 266572 ns 1.03
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s) 9616458.5 ns 9287083.5 ns 1.04
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s) 12040541 ns 12062125 ns 1.00
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s) 24281208 ns 25629896 ns 0.95
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s) 11743000 ns 11743625.5 ns 1.00
lenet(28, 28, 1, 128)/zygote/GPU/CUDA 1191646.5 ns 1194959.5 ns 1.00
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s) 381538333 ns 385030062.5 ns 0.99
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s) 284123771.5 ns 288360083 ns 0.99
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s) 239280375 ns 255891729 ns 0.94
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s) 452582312.5 ns 454057438 ns 1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA 4856388 ns 4834603 ns 1.00
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s) 1153618084 ns 1159548125 ns 0.99
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s) 934081083 ns 928639750 ns 1.01
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s) 921635667 ns 1041025834 ns 0.89
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s) 1402852084 ns 1400301917 ns 1.00
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA 17820477 ns 17860514 ns 1.00
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s) 1049334 ns 1063417 ns 0.99
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s) 2035667 ns 2031167 ns 1.00
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s) 5341791 ns 6215979 ns 0.86
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s) 1401833 ns 1289687.5 ns 1.09
lenet(28, 28, 1, 64)/forward/GPU/CUDA 273502.5 ns 274190 ns 1.00
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s) 6484854.5 ns 6281917 ns 1.03
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s) 12404917 ns 12390375 ns 1.00
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s) 19763708 ns 21407437 ns 0.92
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s) 6066500 ns 6091125 ns 1.00
lenet(28, 28, 1, 64)/zygote/GPU/CUDA 1262508 ns 1242182 ns 1.02
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 70493208 ns 70428792 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 43600229 ns 43611000 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 39622541 ns 39667125 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 132698750 ns 132442542 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1869551 ns 1932803.5 ns 0.97
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 356454771 ns 355601563 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 270112625 ns 270115458 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 254625084 ns 253762937.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 534884312.5 ns 535226770.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 12284081 ns 12278246 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 394586709 ns 399133667 ns 0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 408404250 ns 394300875 ns 1.04
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 678487209 ns 679147666.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 710785791 ns 710284375 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s) 1185827084 ns 1194640667 ns 0.99
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s) 695281125 ns 689604666 ns 1.01
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s) 626587375 ns 645000312 ns 0.97
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s) 1770240250.5 ns 1774100021 ns 1.00
vgg16(32, 32, 3, 128)/forward/GPU/CUDA 12316473 ns 12543961 ns 0.98
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s) 3682510624.5 ns 3679223834 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s) 2820068709 ns 2826435875 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s) 2719913792 ns 2707509167 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s) 5042856917 ns 5055415292 ns 1.00
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA 49349519 ns 49466162 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3418708 ns 3397917 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2082958 ns 2077646 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2540500 ns 2508042 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 6009666 ns 6018583.5 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 326984.5 ns 330593.5 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 26222333 ns 25938625.5 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 18914875 ns 18943166.5 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 19338354.5 ns 19554000 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 39331500 ns 39254458 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2462554 ns 2474870 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 56235458 ns 55544542 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 80835708 ns 82396771 ns 0.98
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 169939541.5 ns 173728042 ns 0.98
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 45337209 ns 45516333 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1784584 ns 1779250 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1103604.5 ns 1102875 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1566667 ns 1584416 ns 0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 3032666 ns 3020250 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 213488 ns 214596.5 ns 0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12541520.5 ns 12529625 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 9203729.5 ns 9208687.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9646562.5 ns 9658437 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 18999125 ns 18982292 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1533148 ns 1539988 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17653167 ns 17638625 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 14324854.5 ns 14351250 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14581208.5 ns 14600292 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 22196729.5 ns 22151625 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 70444166.5 ns 70460291 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 43349417 ns 43523750 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 39635792 ns 39539833 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 132857146 ns 132477750 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1867040.5 ns 1881252.5 ns 0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 361132208 ns 359873333.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 345972771 ns 345894479 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 304480083 ns 305415375 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 724393166 ns 725056958 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 13316474.5 ns 13380753 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 419098563 ns 421101688 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 427616083 ns 421140000 ns 1.02
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 709012833 ns 748051292 ns 0.95
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 714833666 ns 715116625 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s) 1667417 ns 1695875 ns 0.98
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s) 1348521 ns 1355249.5 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s) 1328125 ns 1159396 ns 1.15
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s) 2425812.5 ns 2409792 ns 1.01
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA 582318.5 ns 588933.5 ns 0.99
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s) 9008145.5 ns 8963979.5 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s) 12958667 ns 13000041 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s) 31132854 ns 33199749.5 ns 0.94
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s) 9869291.5 ns 9835041 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA 1441009.5 ns 1481667.5 ns 0.97
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s) 18356416 ns 17747854.5 ns 1.03
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s) 17371542 ns 17252604.5 ns 1.01
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s) 29956292 ns 31103916.5 ns 0.96
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s) 14108833.5 ns 14355042 ns 0.98
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s) 696500 ns 669500.5 ns 1.04
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s) 500812.5 ns 556667 ns 0.90
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s) 1033458 ns 1058916.5 ns 0.98
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s) 725500 ns 725958 ns 1.00
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA 47684 ns 48545 ns 0.98
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s) 1577042 ns 1480334 ns 1.07
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s) 1051000 ns 1045958 ns 1.00
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s) 1370125 ns 1649187 ns 0.83
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s) 2303687 ns 2243312 ns 1.03
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA 238125.5 ns 241138.5 ns 0.99
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s) 1558687.5 ns 1523125 ns 1.02
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s) 1045916.5 ns 1079125 ns 0.97
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s) 1461916 ns 1484062.5 ns 0.99
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s) 2228229 ns 2259292 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3409125 ns 3401000.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2066270.5 ns 2066583.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2525500 ns 2508167 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 6013791 ns 6010687.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA 289794 ns 286726 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 24055187.5 ns 24062625 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 17202000 ns 17164833 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 17133562.5 ns 17141292 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 37566146 ns 37492334 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2407993 ns 2402302 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 54216541.5 ns 53536916 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 83772249.5 ns 83532937.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 166696854 ns 171297229.5 ns 0.97
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 44480500 ns 44566417 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 250124083 ns 249899167 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 148262625 ns 147932291 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 115870833.5 ns 116470479.5 ns 0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 448008979 ns 450194500 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5471191 ns 5441310 ns 1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1103626125 ns 1101316834 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 857579145.5 ns 855731854 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 827690125 ns 827979937.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 1753891459 ns 1754256625 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 29184619 ns 28790457 ns 1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1020122687.5 ns 1014882938 ns 1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 963543750 ns 964872792 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1321081709 ns 1306006041 ns 1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 1722584666.5 ns 1738222270.5 ns 0.99
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s) 1339041 ns 1311167 ns 1.02
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s) 970521 ns 957750 ns 1.01
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s) 971250 ns 703375 ns 1.38
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s) 1954229 ns 1939625 ns 1.01
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA 575255 ns 571080 ns 1.01
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s) 6044667 ns 6006770.5 ns 1.01
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s) 6367750 ns 6329521 ns 1.01
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s) 25225645.5 ns 25490292 ns 0.99
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s) 7126291.5 ns 7082792 ns 1.01
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA 1413961 ns 1359778.5 ns 1.04
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s) 11543792 ns 11590167 ns 1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s) 9916958 ns 10239645.5 ns 0.97
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s) 18299375 ns 18128834 ns 1.01
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s) 8684625 ns 8225354.5 ns 1.06
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s) 389458 ns 362875 ns 1.07
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s) 359500 ns 363208 ns 0.99
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s) 2193750 ns 3032958 ns 0.72
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s) 88042 ns 87687.5 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA 28321 ns 28003 ns 1.01
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s) 396895.5 ns 388708 ns 1.02
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s) 440458 ns 440375 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s) 4619792 ns 4703354 ns 0.98
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s) 265375 ns 259209 ns 1.02
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA 227034 ns 220116.5 ns 1.03
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s) 429500 ns 419708.5 ns 1.02
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s) 470792 ns 470625 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s) 4863667 ns 4962583 ns 0.98
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s) 271041.5 ns 270875 ns 1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s) 333667 ns 307959 ns 1.08
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s) 294437 ns 298292 ns 0.99
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s) 740833.5 ns 764833 ns 0.97
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s) 52854.5 ns 54917 ns 0.96
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA 28719 ns 27854 ns 1.03
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s) 366250 ns 352750 ns 1.04
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s) 335625 ns 336000 ns 1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s) 813354 ns 887229.5 ns 0.92
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s) 155312 ns 151687.5 ns 1.02
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA 212500 ns 205379 ns 1.03
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s) 377292 ns 366770.5 ns 1.03
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s) 351833 ns 350334 ns 1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s) 908146 ns 470875 ns 1.93
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s) 150959 ns 151188 ns 1.00
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s) 602630000 ns 606032542 ns 0.99
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s) 425446812.5 ns 429552104 ns 0.99
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s) 373310250 ns 384048291 ns 0.97
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s) 874089917 ns 874614500 ns 1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA 7031776 ns 7024831 ns 1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s) 2002661937.5 ns 2012811729 ns 0.99
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s) 1606159333.5 ns 1612557354 ns 1.00
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s) 1596550354 ns 1572362875 ns 1.02
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s) 2634530667 ns 2635509917 ns 1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA 25871825 ns 25977885 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s) 520500 ns 521667 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s) 438417 ns 439708 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s) 1781667 ns 2731999.5 ns 0.65
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s) 875958 ns 865959 ns 1.01
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA 47831 ns 47570 ns 1.01
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s) 1845875 ns 1889812.5 ns 0.98
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s) 2803167 ns 2797000 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s) 14421478.5 ns 16236125 ns 0.89
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s) 2762500 ns 2650416 ns 1.04
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA 254912 ns 249198.5 ns 1.02
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s) 1952312.5 ns 1925354.5 ns 1.01
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s) 5063375 ns 5070458 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s) 14644042 ns 16364250 ns 0.89
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s) 2791250 ns 2752417 ns 1.01
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s) 1556042 ns 1507000 ns 1.03
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s) 1242250 ns 1228583 ns 1.01
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s) 1188250 ns 1068166.5 ns 1.11
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s) 2351542 ns 2208000 ns 1.07
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA 587337 ns 589459 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s) 5962500.5 ns 5951958 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s) 4728083 ns 4655854.5 ns 1.02
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s) 25706291.5 ns 27127042 ns 0.95
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s) 7342917 ns 6596583 ns 1.11
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA 1398774.5 ns 1347970 ns 1.04
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s) 13281166.5 ns 12790458 ns 1.04
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s) 11246833 ns 12034667 ns 0.93
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s) 20840520.5 ns 22409521.5 ns 0.93
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s) 10611542 ns 10615250 ns 1.00
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s) 2770.5 ns 2292 ns 1.21
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s) 4875 ns 2500 ns 1.95
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s) 2875 ns 3125 ns 0.92
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s) 2500 ns 2292 ns 1.09
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA 25042 ns 24734 ns 1.01
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s) 7084 ns 7416 ns 0.96
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s) 7166 ns 7416.5 ns 0.97
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s) 7333 ns 7666 ns 0.96
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s) 7208 ns 7167 ns 1.01
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA 216255 ns 210121.5 ns 1.03
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s) 8500 ns 8250 ns 1.03
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s) 8125 ns 8250 ns 0.98
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s) 8291 ns 8417 ns 0.99
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s) 5958 ns 5958 ns 1
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s) 11416 ns 10500 ns 1.09
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s) 14208 ns 12896 ns 1.10
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s) 10958 ns 10666 ns 1.03
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s) 7167 ns 7083 ns 1.01
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA 25461 ns 24767 ns 1.03
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s) 19792 ns 20208 ns 0.98
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s) 20084 ns 20041 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s) 20041 ns 20209 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s) 19875 ns 20125 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA 236659.5 ns 230521 ns 1.03
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s) 23500 ns 23625 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s) 23500 ns 23500 ns 1
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s) 23791 ns 23750 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s) 21375 ns 21125 ns 1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s) 28479.5 ns 28292 ns 1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s) 28917 ns 28500 ns 1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s) 28625 ns 28708 ns 1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s) 46083 ns 47396 ns 0.97
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA 26494 ns 25741 ns 1.03
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s) 222354.5 ns 220542 ns 1.01
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s) 275541.5 ns 281604.5 ns 0.98
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s) 4186187.5 ns 4282416 ns 0.98
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s) 145834 ns 146208.5 ns 1.00
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA 209501 ns 210161.5 ns 1.00
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s) 333708 ns 330833 ns 1.01
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s) 313334 ns 321584 ns 0.97
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s) 519937.5 ns 763437.5 ns 0.68
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s) 161041 ns 161895.5 ns 0.99
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s) 1729.5 ns 2000 ns 0.86
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s) 4584 ns 1875 ns 2.44
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s) 2417 ns 2541 ns 0.95
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s) 2084 ns 1666 ns 1.25
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA 23313.5 ns 23005 ns 1.01
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s) 5500 ns 5250 ns 1.05
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s) 5250 ns 5458 ns 0.96
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s) 5209 ns 5541 ns 0.94
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s) 5209 ns 5500 ns 0.95
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA 243931.5 ns 256932 ns 0.95
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s) 11250 ns 11250 ns 1
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s) 11291 ns 11250 ns 1.00
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s) 11417 ns 11458 ns 1.00
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s) 6917 ns 7000 ns 0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 79906375 ns 79995437.5 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 49046500 ns 49006333 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 44994417 ns 43270416.5 ns 1.04
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 151607583 ns 151307500 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 2724784.5 ns 2675260 ns 1.02
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 664456250 ns 673208958 ns 0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 410418958 ns 413629250 ns 0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 401447083 ns 396318917 ns 1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 682646167 ns 684603375 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 14573408 ns 14606131 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 711193604.5 ns 713812625 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 688899625 ns 676256333 ns 1.02
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 1017383917 ns 1061049792 ns 0.96
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 998024084 ns 1000454667 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Please sign in to comment.