Vectorize load instructions in dmmv f16 CUDA kernel (#9816) #29
Job | Run time |
---|---|
11m 8s | |
12m 22s | |
1m 36s | |
1m 32s | |
3m 52s | |
18m 10s | |
4m 34s | |
2m 50s | |
4m 30s | |
2m 41s | |
2m 45s | |
2m 25s | |
3m 27s | |
3m 21s | |
7m 12s | |
5m 34s | |
3m 15s | |
2m 43s | |
6m 26s | |
1m 31s | |
2m 1s | |
3m 45s | |
3m 7s | |
3m 38s | |
3m 42s | |
4m 33s | |
47m 11s | |
4m 33s | |
43m 42s | |
22m 39s | |
4m 20s | |
6m 41s | |
4m 51s | |
28m 31s | |
11m 26s | |
5m 29s | |
25m 36s | |
4m 45s | |
27m 10s | |
13m 16s | |
3m 11s | |
9m 12s | |
2m 50s | |
1m 16s | |
6h 29m 19s |