forked from intel/Deep-learning-math-kernel-research
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathTODO
65 lines (52 loc) · 1.69 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Wino:
- Tile-size: 6
Blocking:
- tail oc/ic handling
Build:
- arch-specific compiling option
Fusion:
- relu
- sum
OMP:
- kmp_malloc(?) allocate from thread-local heap
- omp_in_final()
- omp_get_num_procs
Modularity (2018-07-05): Done
- trans_input/trans_output: common function for offset computing
- code reorg, break transform kernels into small kernel files:
elk_conv_wino_4x4_3x3_input.hxx
...
- Memory size calculation for different xopt
Plain format (nchw/oihw) support (2018-7-11): Done
- fused reorder
- scatter/gather
- double buffer (drop as no effect)
- enable plain fmt for 0xa0e0/0xa0e1
- enable plain fmt for 0xa073
- fix performance regression in block16: format as template parameter
- format-as-blocked (esp. to avoid false sharing in fused output reorder)
IC/OC != 16x (2018-7-16): Done
- Support DNN first layer (drop as it can not achieve good performance with Winograd)
IC < 16, OC = 16x, nchw + Oihw16o => nChw16o
- Support blocked format with padded tensor
IC|OC != 16x, nChw16c + OIhw16i16o => nChw16c
- Support plain format in format-as path
IC|OC != 16x, nchw + oihw => nchw
- Support plain format in format-is path
IC|OC != 16x, nchw + oihw => nchw
MD-Array (2018-7-22): Done
- Cross platform/compiler MD-array
- Improve MD-Array performance for ICC
Conv_1x1 (2018-8-1): Done
- Uni-stride: kernel=1x1, stride=1, padding=0
- Stride=2, padding=0
- Blocked format
- plain format, IC/OC != 16x
- TODO: code clean and Perf tuning
- TODO: padding support
GEMM kernel (2018-8-23): Done
- Rewrite gemm kernel with better readability and modularity
- Apply new gemm kernel to conv1x1
- Apply new gemm kernel to Winograd
- Perf tuning for Winograd: xopt-A072
- Refactoring elx_conv_wino_t