How to match the performance of numpy.dot #1

wweic · 2017-12-09T06:30:45Z

http://docs.tvmlang.org/tutorials/optimize/opt_gemm.html#sphx-glr-tutorials-optimize-opt-gemm-py

In [27]: %paste
evaluator = func.time_evaluator(func.entry_name, tvm.cpu(0), number = 5)
c = tvm.nd.array(numpy.zeros((N, N), dtype = dtype), tvm.cpu(0))
print('Opt3: %f' % evaluator(a, b, c).mean)

## -- End pasted text --
Opt3: 0.118320

In [28]: %paste
evaluator = func.time_evaluator(func.entry_name, tvm.cpu(0), number = 5)
c = tvm.nd.array(numpy.zeros((N, N), dtype = dtype), tvm.cpu(0))
print('Opt3: %f' % evaluator(a, b, c).mean)

## -- End pasted text --
Opt3: 0.122024

In [29]: %paste
_a = a.asnumpy()
_b = b.asnumpy()
now = time.clock()
answer = numpy.dot(_a, _b)
print("Numpy: %f" % (time.clock() - now))

## -- End pasted text --
Numpy: 0.089359

In [30]: %paste
_a = a.asnumpy()
_b = b.asnumpy()
now = time.clock()
answer = numpy.dot(_a, _b)
print("Numpy: %f" % (time.clock() - now))

## -- End pasted text --
Numpy: 0.072706

There is 50% performance difference.

wweic · 2017-12-09T06:32:44Z

My numpy opt config:

In [33]: np.__config__.show()
blas_mkl_info:
  NOT AVAILABLE
blis_info:
  NOT AVAILABLE
openblas_info:
  NOT AVAILABLE
atlas_3_10_blas_threads_info:
  NOT AVAILABLE
atlas_3_10_blas_info:
  NOT AVAILABLE
atlas_blas_threads_info:
  NOT AVAILABLE
atlas_blas_info:
  NOT AVAILABLE
blas_opt_info:
    extra_compile_args = ['-msse3', '-I/System/Library/Frameworks/vecLib.framework/Headers']
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
    define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]
lapack_mkl_info:
  NOT AVAILABLE
openblas_lapack_info:
  NOT AVAILABLE
atlas_3_10_threads_info:
  NOT AVAILABLE
atlas_3_10_info:
  NOT AVAILABLE
atlas_threads_info:
  NOT AVAILABLE
atlas_info:
  NOT AVAILABLE
lapack_opt_info:
    extra_compile_args = ['-msse3']
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
    define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]

wweic · 2017-12-09T22:38:55Z

where numpy's dot implementation

find all symbols with `dot`

(lldb) image lookup -r -n dot
2 matches found in /usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/Python:
        Address: Python[0x0000000000100767] (Python.__TEXT.__text + 1044391)
        Summary: Python`dotjoinattr        Address: Python[0x0000000000100884] (Python.__TEXT.__text + 1044676)
        Summary: Python`dotted_getattr
1 match found in /usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/lib-dynload/_pickle.cpython-36m-darwin.so:
        Address: _pickle.cpython-36m-darwin.so[0x0000000000005ed7] (_pickle.cpython-36m-darwin.so.__TEXT.__text + 19707)
        Summary: _pickle.cpython-36m-darwin.so`get_dotted_path
5 matches found in /usr/local/opt/sqlite/lib/libsqlite3.0.dylib:
        Address: libsqlite3.0.dylib[0x0000000000018032] (libsqlite3.0.dylib.__TEXT.__text + 93662)
        Summary: libsqlite3.0.dylib`dotlockClose        Address: libsqlite3.0.dylib[0x000000000001805c] (libsqlite3.0.dylib.__TEXT.__text + 93704)
        Summary: libsqlite3.0.dylib`dotlockLock        Address: libsqlite3.0.dylib[0x00000000000180eb] (libsqlite3.0.dylib.__TEXT.__text + 93847)
        Summary: libsqlite3.0.dylib`dotlockUnlock        Address: libsqlite3.0.dylib[0x000000000001813c] (libsqlite3.0.dylib.__TEXT.__text + 93928)
        Summary: libsqlite3.0.dylib`dotlockCheckReservedLock        Address: libsqlite3.0.dylib[0x0000000000018171] (libsqlite3.0.dylib.__TEXT.__text + 93981)
        Summary: libsqlite3.0.dylib`dotlockIoFinderImpl
28 matches found in /usr/local/lib/python3.6/site-packages/numpy/core/multiarray.cpython-36m-darwin.so:
        Address: multiarray.cpython-36m-darwin.so[0x00000000000063b0] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 20912)
        Summary: multiarray.cpython-36m-darwin.so`FLOAT_dot        Address: multiarray.cpython-36m-darwin.so[0x0000000000006560] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 21344)
        Summary: multiarray.cpython-36m-darwin.so`DOUBLE_dot        Address: multiarray.cpython-36m-darwin.so[0x0000000000006710] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 21776)
        Summary: multiarray.cpython-36m-darwin.so`CFLOAT_dot        Address: multiarray.cpython-36m-darwin.so[0x00000000000068e0] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 22240)
        Summary: multiarray.cpython-36m-darwin.so`CDOUBLE_dot        Address: multiarray.cpython-36m-darwin.so[0x000000000001a940] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 104256)
        Summary: multiarray.cpython-36m-darwin.so`TIMEDELTA_dot        Address: multiarray.cpython-36m-darwin.so[0x000000000001d9a0] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 116640)
        Summary: multiarray.cpython-36m-darwin.so`DATETIME_dot        Address: multiarray.cpython-36m-darwin.so[0x000000000001f770] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 124272)
        Summary: multiarray.cpython-36m-darwin.so`OBJECT_dot        Address: multiarray.cpython-36m-darwin.so[0x0000000000020f90] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 130448)
        Summary: multiarray.cpython-36m-darwin.so`CLONGDOUBLE_dot        Address: multiarray.cpython-36m-darwin.so[0x0000000000027ad0] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 157904)
        Summary: multiarray.cpython-36m-darwin.so`LONGDOUBLE_dot        Address: multiarray.cpython-36m-darwin.so[0x000000000002ef00] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 187648)
        Summary: multiarray.cpython-36m-darwin.so`HALF_dot        Address: multiarray.cpython-36m-darwin.so[0x00000000000321e0] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 200672)
        Summary: multiarray.cpython-36m-darwin.so`ULONGLONG_dot        Address: multiarray.cpython-36m-darwin.so[0x0000000000035180] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 212864)
        Summary: multiarray.cpython-36m-darwin.so`LONGLONG_dot        Address: multiarray.cpython-36m-darwin.so[0x0000000000038660] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 226400)
        Summary: multiarray.cpython-36m-darwin.so`ULONG_dot        Address: multiarray.cpython-36m-darwin.so[0x000000000003b600] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 238592)
        Summary: multiarray.cpython-36m-darwin.so`LONG_dot        Address: multiarray.cpython-36m-darwin.so[0x000000000003e4a0] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 250528)
        Summary: multiarray.cpython-36m-darwin.so`UINT_dot        Address: multiarray.cpython-36m-darwin.so[0x0000000000041250] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 262224)
        Summary: multiarray.cpython-36m-darwin.so`INT_dot        Address: multiarray.cpython-36m-darwin.so[0x0000000000043fe0] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 273888)
        Summary: multiarray.cpython-36m-darwin.so`USHORT_dot        Address: multiarray.cpython-36m-darwin.so[0x0000000000046db0] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 285616)
        Summary: multiarray.cpython-36m-darwin.so`SHORT_dot        Address: multiarray.cpython-36m-darwin.so[0x0000000000049a30] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 297008)
        Summary: multiarray.cpython-36m-darwin.so`UBYTE_dot        Address: multiarray.cpython-36m-darwin.so[0x000000000004c560] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 308064)
        Summary: multiarray.cpython-36m-darwin.so`BYTE_dot        Address: multiarray.cpython-36m-darwin.so[0x000000000004f430] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 320048)
        Summary: multiarray.cpython-36m-darwin.so`BOOL_dot        Address: multiarray.cpython-36m-darwin.so[0x000000000005c450] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 373328)
        Summary: multiarray.cpython-36m-darwin.so`dot_alignment_error        Address: multiarray.cpython-36m-darwin.so[0x00000000000e4c80] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 932480)
        Summary: multiarray.cpython-36m-darwin.so`array_dot        Address: multiarray.cpython-36m-darwin.so[0x00000000000ee250] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 970832)
        Summary: multiarray.cpython-36m-darwin.so`array_vdot        Address: multiarray.cpython-36m-darwin.so[0x000000000011c620] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 1160224)
        Summary: multiarray.cpython-36m-darwin.so`CFLOAT_vdot        Address: multiarray.cpython-36m-darwin.so[0x000000000011c7f0] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 1160688)
        Summary: multiarray.cpython-36m-darwin.so`CDOUBLE_vdot        Address: multiarray.cpython-36m-darwin.so[0x000000000011c970] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 1161072)
        Summary: multiarray.cpython-36m-darwin.so`CLONGDOUBLE_vdot        Address: multiarray.cpython-36m-darwin.so[0x000000000011c9e0] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 1161184)
        Summary: multiarray.cpython-36m-darwin.so`OBJECT_vdot
14 matches found in /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libvDSP.dylib:
        Address: libvDSP.dylib[0x00000000000741b0] (libvDSP.dylib.__TEXT.__text + 472688)
        Summary: libvDSP.dylib`vDSP_dotpr        Address: libvDSP.dylib[0x0000000000010110] (libvDSP.dylib.__TEXT.__text + 62928)
        Summary: libvDSP.dylib`vDSP_dotpr2        Address: libvDSP.dylib[0x0000000000192400] (libvDSP.dylib.__TEXT.__text + 1644736)
        Summary: libvDSP.dylib`vDSP_dotpr2D        Address: libvDSP.dylib[0x00000000000101b0] (libvDSP.dylib.__TEXT.__text + 63088)
        Summary: libvDSP.dylib`vDSP_dotpr2_s1_15        Address: libvDSP.dylib[0x0000000000011840] (libvDSP.dylib.__TEXT.__text + 68864)
        Summary: libvDSP.dylib`vDSP_dotpr2_s8_24        Address: libvDSP.dylib[0x0000000000075280] (libvDSP.dylib.__TEXT.__text + 476992)
        Summary: libvDSP.dylib`vDSP_dotprD        Address: libvDSP.dylib[0x0000000000011b90] (libvDSP.dylib.__TEXT.__text + 69712)
        Summary: libvDSP.dylib`vDSP_dotpr_s1_15        Address: libvDSP.dylib[0x0000000000011e10] (libvDSP.dylib.__TEXT.__text + 70352)
        Summary: libvDSP.dylib`vDSP_dotpr_s8_24        Address: libvDSP.dylib[0x000000000013e310] (libvDSP.dylib.__TEXT.__text + 1300432)
        Summary: libvDSP.dylib`vDSP_zdotpr        Address: libvDSP.dylib[0x00000000001a4ee0] (libvDSP.dylib.__TEXT.__text + 1721248)
        Summary: libvDSP.dylib`vDSP_zdotprD        Address: libvDSP.dylib[0x00000000001a5060] (libvDSP.dylib.__TEXT.__text + 1721632)
        Summary: libvDSP.dylib`vDSP_zidotpr        Address: libvDSP.dylib[0x00000000001a51e0] (libvDSP.dylib.__TEXT.__text + 1722016)
        Summary: libvDSP.dylib`vDSP_zidotprD        Address: libvDSP.dylib[0x000000000013ef10] (libvDSP.dylib.__TEXT.__text + 1303504)
        Summary: libvDSP.dylib`vDSP_zrdotpr        Address: libvDSP.dylib[0x00000000001a5bf0] (libvDSP.dylib.__TEXT.__text + 1724592)
        Summary: libvDSP.dylib`vDSP_zrdotprD
3 matches found in /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libvMisc.dylib:
        Address: libvMisc.dylib[0x0000000000009db0] (libvMisc.dylib.__TEXT.__text + 36560)
        Summary: libvMisc.dylib`vSdot        Address: libvMisc.dylib[0x0000000000009e30] (libvMisc.dylib.__TEXT.__text + 36688)
        Summary: libvMisc.dylib`vSndot        Address: libvMisc.dylib[0x0000000000000ee0] (libvMisc.dylib.__TEXT.__text + 0)
        Summary: libvMisc.dylib`vec_sdot
36 matches found in /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib:
        Address: libBLAS.dylib[0x0000000000073ea8] (libBLAS.dylib.__TEXT.__text + 472232)
        Summary: libBLAS.dylib`APL_zdotu_stride11_AVX256        Address: libBLAS.dylib[0x0000000000073fe0] (libBLAS.dylib.__TEXT.__text + 472544)
        Summary: libBLAS.dylib`APL_zdotu_AVX128        Address: libBLAS.dylib[0x00000000000740c6] (libBLAS.dylib.__TEXT.__text + 472774)
        Summary: libBLAS.dylib`APL_zdotu_stride11_AVX128        Address: libBLAS.dylib[0x0000000000076514] (libBLAS.dylib.__TEXT.__text + 482068)
        Summary: libBLAS.dylib`sdot_forwardKernel_AVX512        Address: libBLAS.dylib[0x000000000007670d] (libBLAS.dylib.__TEXT.__text + 482573)
        Summary: libBLAS.dylib`sdot_backwardKernel_AVX512        Address: libBLAS.dylib[0x000000000009be2c] (libBLAS.dylib.__TEXT.__text + 635948)
        Summary: libBLAS.dylib`APL_zdotc_stride11_AVX256        Address: libBLAS.dylib[0x000000000009c138] (libBLAS.dylib.__TEXT.__text + 636728)
        Summary: libBLAS.dylib`APL_zdotc_AVX128        Address: libBLAS.dylib[0x000000000009c21b] (libBLAS.dylib.__TEXT.__text + 636955)
        Summary: libBLAS.dylib`APL_zdotc_stride11_AVX128        Address: libBLAS.dylib[0x00000000000a9164] (libBLAS.dylib.__TEXT.__text + 690020)
        Summary: libBLAS.dylib`dsdot_forwardKernel_AVX512        Address: libBLAS.dylib[0x00000000000a936f] (libBLAS.dylib.__TEXT.__text + 690543)
        Summary: libBLAS.dylib`dsdot_backwardKernel_AVX512        Address: libBLAS.dylib[0x00000000001111a6] (libBLAS.dylib.__TEXT.__text + 1116070)
        Summary: libBLAS.dylib`ddot_forwardKernel_AVX512        Address: libBLAS.dylib[0x0000000000111381] (libBLAS.dylib.__TEXT.__text + 1116545)
        Summary: libBLAS.dylib`ddot_backwardKernel_AVX512        Address: libBLAS.dylib[0x000000000006a77a] (libBLAS.dylib.__TEXT.__text + 433530)
        Summary: libBLAS.dylib`cblas_cdotc_sub        Address: libBLAS.dylib[0x000000000007310b] (libBLAS.dylib.__TEXT.__text + 468747)
        Summary: libBLAS.dylib`cblas_cdotu_sub        Address: libBLAS.dylib[0x000000000007693c] (libBLAS.dylib.__TEXT.__text + 483132)
        Summary: libBLAS.dylib`cblas_ddot        Address: libBLAS.dylib[0x0000000000023a79] (libBLAS.dylib.__TEXT.__text + 143481)
        Summary: libBLAS.dylib`cblas_dsdot        Address: libBLAS.dylib[0x0000000000000d94] (libBLAS.dylib.__TEXT.__text + 916)
        Summary: libBLAS.dylib`cblas_sdot        Address: libBLAS.dylib[0x0000000000023a54] (libBLAS.dylib.__TEXT.__text + 143444)
        Summary: libBLAS.dylib`cblas_sdsdot        Address: libBLAS.dylib[0x000000000003be9f] (libBLAS.dylib.__TEXT.__text + 242847)
        Summary: libBLAS.dylib`cblas_zdotc_sub        Address: libBLAS.dylib[0x000000000011330d] (libBLAS.dylib.__TEXT.__text + 1124621)
        Summary: libBLAS.dylib`cblas_zdotu_sub        Address: libBLAS.dylib[0x0000000000006c1a] (libBLAS.dylib.__TEXT.__text + 25114)
        Summary: libBLAS.dylib`CDOTC        Address: libBLAS.dylib[0x0000000000006c1a] (libBLAS.dylib.__TEXT.__text + 25114)
        Summary: libBLAS.dylib`CDOTC        Address: libBLAS.dylib[0x0000000000006c3c] (libBLAS.dylib.__TEXT.__text + 25148)
        Summary: libBLAS.dylib`CDOTU        Address: libBLAS.dylib[0x0000000000006c3c] (libBLAS.dylib.__TEXT.__text + 25148)
        Summary: libBLAS.dylib`CDOTU        Address: libBLAS.dylib[0x0000000000006d52] (libBLAS.dylib.__TEXT.__text + 25426)
        Summary: libBLAS.dylib`DDOT        Address: libBLAS.dylib[0x0000000000006d52] (libBLAS.dylib.__TEXT.__text + 25426)
        Summary: libBLAS.dylib`DDOT        Address: libBLAS.dylib[0x0000000000006e11] (libBLAS.dylib.__TEXT.__text + 25617)
        Summary: libBLAS.dylib`DSDOT        Address: libBLAS.dylib[0x0000000000006e11] (libBLAS.dylib.__TEXT.__text + 25617)
        Summary: libBLAS.dylib`DSDOT        Address: libBLAS.dylib[0x0000000000006e88] (libBLAS.dylib.__TEXT.__text + 25736)
        Summary: libBLAS.dylib`SDOT        Address: libBLAS.dylib[0x0000000000006e88] (libBLAS.dylib.__TEXT.__text + 25736)
        Summary: libBLAS.dylib`SDOT        Address: libBLAS.dylib[0x0000000000006e9e] (libBLAS.dylib.__TEXT.__text + 25758)
        Summary: libBLAS.dylib`SDSDOT        Address: libBLAS.dylib[0x0000000000006e9e] (libBLAS.dylib.__TEXT.__text + 25758)
        Summary: libBLAS.dylib`SDSDOT        Address: libBLAS.dylib[0x0000000000006fba] (libBLAS.dylib.__TEXT.__text + 26042)
        Summary: libBLAS.dylib`ZDOTC        Address: libBLAS.dylib[0x0000000000006fba] (libBLAS.dylib.__TEXT.__text + 26042)
        Summary: libBLAS.dylib`ZDOTC        Address: libBLAS.dylib[0x0000000000006fdc] (libBLAS.dylib.__TEXT.__text + 26076)
        Summary: libBLAS.dylib`ZDOTU        Address: libBLAS.dylib[0x0000000000006fdc] (libBLAS.dylib.__TEXT.__text + 26076)
        Summary: libBLAS.dylib`ZDOTU
2 matches found in /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libSparse.dylib:
        Address: libSparse.dylib[0x000000000003c9d0] (libSparse.dylib.__TEXT.__text + 242000)
        Summary: libSparse.dylib`libmetis__idot        Address: libSparse.dylib[0x000000000003cea7] (libSparse.dylib.__TEXT.__text + 243239)
        Summary: libSparse.dylib`libmetis__rdot
2 matches found in /Users/weichen/workspace/deep-learning/tvm/lib/libtvm.dylib:
        Address: libtvm.dylib[0x00000000015e025a] (libtvm.dylib.__TEXT.__text + 22926778)
        Summary: libtvm.dylib`llvm::sys::path::remove_dots(llvm::SmallVectorImpl<char>&, bool, llvm::sys::path::Style)        Address: libtvm.dylib[0x00000000015e01fe] (libtvm.dylib.__TEXT.__text + 22926686)
        Summary: libtvm.dylib`llvm::sys::path::remove_leading_dotslash(llvm::StringRef, llvm::sys::path::Style)
(lldb)

wweic · 2017-12-09T22:46:28Z

set breakpoints on all symbols with `blas`, and wait

(lldb) br s -r  blas
Breakpoint 2: 160 locations.

The backtrace

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.97
  * frame #0: 0x00007fff53ed1cc0 libBLAS.dylib`cblas_sgemm
    frame #1: 0x000000010748f0f4 multiarray.cpython-36m-darwin.so`cblas_matrixproduct + 3700
    frame #2: 0x00000001074583c7 multiarray.cpython-36m-darwin.so`PyArray_MatrixProduct2 + 215
    frame #3: 0x000000010745d20f multiarray.cpython-36m-darwin.so`array_matrixproduct + 191
    frame #4: 0x0000000105071fba Python`_PyCFunction_FastCallDict + 468
    frame #5: 0x00000001050db792 Python`call_function + 584
    frame #6: 0x00000001050d8bab Python`_PyEval_EvalFrameDefault + 23930
    frame #7: 0x00000001050dbfb4 Python`_PyEval_EvalCodeWithName + 1978
    frame #8: 0x00000001050d2d88 Python`PyEval_EvalCode + 100
    frame #9: 0x00000001050d0743 Python`builtin_exec + 531
    frame #10: 0x0000000105071e92 Python`_PyCFunction_FastCallDict + 172
    frame #11: 0x00000001050db792 Python`call_function + 584
    frame #12: 0x00000001050d8bab Python`_PyEval_EvalFrameDefault + 23930
    frame #13: 0x00000001050dbfb4 Python`_PyEval_EvalCodeWithName + 1978
    frame #14: 0x00000001050dc6cf Python`fast_function + 241
    frame #15: 0x00000001050db765 Python`call_function + 539
    frame #16: 0x00000001050d8bab Python`_PyEval_EvalFrameDefault + 23930
    frame #17: 0x00000001050dbfb4 Python`_PyEval_EvalCodeWithName + 1978
    frame #18: 0x00000001050dc6cf Python`fast_function + 241
    frame #19: 0x00000001050db765 Python`call_function + 539
    frame #20: 0x00000001050d8c31 Python`_PyEval_EvalFrameDefault + 24064
    frame #21: 0x00000001050dbfb4 Python`_PyEval_EvalCodeWithName + 1978
    frame #22: 0x00000001050dc6cf Python`fast_function + 241
    frame #23: 0x00000001050db765 Python`call_function + 539
    frame #24: 0x00000001050d8c31 Python`_PyEval_EvalFrameDefault + 24064
    frame #25: 0x00000001050dbfb4 Python`_PyEval_EvalCodeWithName + 1978
    frame #26: 0x00000001050dc6cf Python`fast_function + 241
    frame #27: 0x00000001050db765 Python`call_function + 539
    frame #28: 0x00000001050d8bab Python`_PyEval_EvalFrameDefault + 23930
    frame #29: 0x00000001050dbfb4 Python`_PyEval_EvalCodeWithName + 1978
    frame #30: 0x00000001050dc6cf Python`fast_function + 241
    frame #31: 0x00000001050db765 Python`call_function + 539
    frame #32: 0x00000001050d8bab Python`_PyEval_EvalFrameDefault + 23930
    frame #33: 0x00000001050dc96b Python`_PyFunction_FastCall + 122
    frame #34: 0x00000001050db765 Python`call_function + 539
    frame #35: 0x00000001050d8bab Python`_PyEval_EvalFrameDefault + 23930
    frame #36: 0x00000001050dbfb4 Python`_PyEval_EvalCodeWithName + 1978
    frame #37: 0x00000001050dc8b9 Python`_PyFunction_FastCallDict + 477
    frame #38: 0x0000000105037a63 Python`_PyObject_FastCallDict + 231
    frame #39: 0x0000000105037b83 Python`_PyObject_Call_Prepend + 149
    frame #40: 0x00000001050378bb Python`PyObject_Call + 102
    frame #41: 0x00000001050d8e06 Python`_PyEval_EvalFrameDefault + 24533
    frame #42: 0x00000001050dbfb4 Python`_PyEval_EvalCodeWithName + 1978
    frame #43: 0x00000001050dc6cf Python`fast_function + 241
    frame #44: 0x00000001050db765 Python`call_function + 539
    frame #45: 0x00000001050d8bab Python`_PyEval_EvalFrameDefault + 23930
    frame #46: 0x00000001050dbfb4 Python`_PyEval_EvalCodeWithName + 1978
    frame #47: 0x00000001050d2d88 Python`PyEval_EvalCode + 100
    frame #48: 0x00000001050fc3a6 Python`run_mod + 58
    frame #49: 0x00000001050fc6bb Python`PyRun_FileExFlags + 178
    frame #50: 0x00000001050fbd54 Python`PyRun_SimpleFileExFlags + 676
    frame #51: 0x00000001051104cc Python`Py_Main + 3472
    frame #52: 0x0000000105027e17 Python`___lldb_unnamed_symbol1$$Python + 235
    frame #53: 0x00007fff7e9c5145 libdyld.dylib`start + 1

wweic · 2017-12-09T22:53:23Z

numpy dispatch code:
https://github.com/numpy/numpy/blob/5c16f535e7515c2394b19cc6778ad9b5ae24d729/numpy/core/src/multiarray/cblasfuncs.c#L50-L78

* updates (#1) * add scalars * change format * change inferattr interface * remove scalar * remove warning

Fix Windows build for Neo DLR

…generating (apache#5962) * Code migration Start (#1) * Init commit: Code migration Start * Add loop_state.cc/h * Add ComputeDAG basic test * Split transform_step out & Update more UTs (apache#3) * Split transform_step out * Update GetProducers & GetConsumers * Update UTs * Add UT for CacheReadWrite & Some bug fix * Add search_task, measure and serialization (apache#4) * Add FollowSplit & FollowFusedSplit tests * Update dag.InferBound & its UT * Add search_task, measure and serialization * Update Serialization UT * Add MetaTileRewritePolicy (apache#5) * Add feature * Add cost_model, meta_tile_rewrite_policy * Add MetaTileRewritePolicy basic UT * Basic Python API for State (apache#6) * Add Basic Python API for State * Add UTs for State * Add Python API: Measure & Task (apache#7) * Update the return value of state operation * Add task * Copy measure.py & utils.py * Fix LocalBuilder * Fix LocalRunner * Add ansor.auto_schedule() API; First AutoSchedule working version(apache#8) * Add basic Python support for ansor.auto_schedule * Update AutoSchedule API * Bug fix for get the attach point of a fused iter * Update UT after infer bug fix * Bug fix & Add python serialization API (apache#10) * Delete C++ UT hack since Python is ready * Add ndarray.non_empty * Update Serialization python API * Improve code style, python wrapper and test cases (apache#11) * Update c++ code style and unit test * Update python State wrapper and test cases * fix unit tests * Add RPCRunner & OpenCL/CUDA test (apache#12) * Add RPCRunner & OpenCL search test * Add CUDA search test * Add RPCRunner test * rebase to upstream/master * Add Ansor basic tutorial (apache#13) * Add basic tutorial * migrate feature extraction (apache#14) * Add XGBModel & RPCRunnerWarpper (apache#15) * Add XGBModel & RPCRunnerWarpper * Revert "Add Parallel Granularity Mutation" * Migrate workload_registry.py (apache#16) * add workload registry * update * update * add task scheduler (apache#17) * Add conv2d cuda tutorial with workload registry (apache#18) * add tune_test.py (the old tune_wkl.py) (apache#19) * add tune_test.py (the old tune_wkl.py) * update * fix measure * fix for gpu * Code refine for tune_test.py & Add a pre load callback (apache#20) * Bug fix for tutorials * Add PreLoadMeasuredStates * Add search_callback support for task tuner * Code refine for tune_test.py * Update * Update * Update * Update * Bug fix * Add python custom sketch rule (apache#21) * Add custom sketch rule * Bug fix * Ansor Relay Integration (without layout rewrite) (apache#22) * relay integration * Add tune_op_subgraph.py & Some code clean for tune_network.py (apache#23) * Add single op tune scripts * Add tune subgraph support * Merge all op & all subgraph to one file * Rename file * add explicit_unroll_max_extent (apache#25) * Add Index simplification & API update (apache#26) * Add vectorized cooperative_fetching test * Update math simplify for vectorized CF * File rename * Update tune_network * API update * Update PreLoadMeasuredStates & Some bug fix (apache#27) * Add a threading wrapper to fix the test bug * Set default TVM_USE_AUTO_SCHEDULER to false * Update PreLoadMeasuredStates callback * Add tensorize step for loop_state (apache#31) * Add tensorize step * State python api update (apache#33) * Start to update api * Add compute_dag to state * API update * kernel layout rewrite (apache#28) * kernel layout rewrite * remove some hacks * add defuse_ops pass and move kernel_layout_rewrite pass after fuse_ops pass * set TVM_RELAY_DISABLE_BUILD_CACHE for task extraction and prepare_layout_rewrite * [cache flush] port cache flush to ansor (apache#32) * Improve relay integration (apache#34) * tmp checkpoint * Improve relay integration * Improve relay integration * Fix xgb error & Simplify dispatcher (apache#35) * Rename "MetaTileRewritePolicy" to "SketchPolicy". (apache#36) * Rename "MetaTileRewritePolicy" to "SketchPolicy". * Add a new class for auto_unroll_max_step, storage_offset in StageNode * fix tune_op_subgraph.py * rebase * Migrate all node::make to noderef's construct function (apache#37) * Start to move xxxnode::make to noderef() * Update * Update * Finish transform_step * Finish comute dag & auto schedule * Update * Update * Update * Update * Update * Code refine * Code refine * Code refine * Update * Update * Some lint fix & Recover the double constructor of tvm::PrimExpr (apache#39) * lint fix * clang-format-fix * pylint fix * Update * Recover the double constructor of tvm::PrimExpr * Fix pylint * pylint fix * pylint fix * Add MutateComputeLocation and MutateParallel in evolutionary search (apache#40) * Add MutateComputeLocation and MutateParallel in evolutionary search * fix lint * Improve loop state python API (stage_tensors -> stage_ops) (apache#41) * improve loop state python API (stage_tensors -> stage_ops) * fix * ComputeDAG bug fix & Add Custom TensorCore Matmul Example (apache#42) * Bug Fix * Sample example of Custom TensorCore Matmul * Rever Commits, Start to build minimum Ansor system * Code clean for minimum Ansor system * Bug fix & Delete AccessAnalyzer * Delete attachmap & Code clean * Doc update Update statenode::stages from vector to Array * Headfile update & Python doc update * clang-format fix * pylint fix * Update * Doc update * Update * Bug fix after code merge to the new master * clang-format fix * Update * Update * Update std::vector to Array; Update verbosity setting; Some commemts addressed * std::vector->Array & std::string->String * Add init_state to ComputeDAG * Update * Update some unordered_map to Map * clang-format fix * Comments addressed Delete ReplayAndInferBound Delete ReplaySteps & InferBoundCommon * Lint fix * Update * Update * Update * Update * Update * Update * Update * Update * Update * Rename ansor namespace to auto_schedule * Update * Rename ThreadPool to ParallelFor * Add parallel_for * Remove ThreadPool * Update python/tvm/auto_schedule/auto_schedule.py * trigger CI Co-authored-by: Lianmin Zheng <[email protected]> Co-authored-by: Minmin Sun (孙敏敏) <[email protected]> Co-authored-by: Zhao Wu <[email protected]>

wweic pushed a commit that referenced this issue Jul 10, 2018

[PASS] Add save/load json (#1)

3c1ac2a

wweic pushed a commit that referenced this issue Jul 10, 2018

update (apache#26)

c362a28

* updates (#1) * add scalars * change format * change inferattr interface * remove scalar * remove warning

wweic pushed a commit that referenced this issue Sep 25, 2018

[PASS] Add save/load json (#1)

5d40732

wweic pushed a commit that referenced this issue Sep 25, 2018

update (apache#26)

6ffeae9

* updates (#1) * add scalars * change format * change inferattr interface * remove scalar * remove warning

wweic pushed a commit that referenced this issue Feb 4, 2019

Add memory manager (#1)

d068882

wweic pushed a commit that referenced this issue Mar 12, 2019

Merge pull request #1 from neo-ai/fix_win_build

4477957

Fix Windows build for Neo DLR

wweic pushed a commit that referenced this issue Mar 30, 2019

Add memory manager (#1)

6b52717

wweic pushed a commit that referenced this issue Apr 15, 2019

Add memory manager (#1)

becdbb1

wweic pushed a commit that referenced this issue Apr 22, 2019

Add memory manager (#1)

e5a4715

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to match the performance of numpy.dot #1

How to match the performance of numpy.dot #1

wweic commented Dec 9, 2017

wweic commented Dec 9, 2017

wweic commented Dec 9, 2017

wweic commented Dec 9, 2017 •

edited

Loading

wweic commented Dec 9, 2017 •

edited

Loading

How to match the performance of numpy.dot #1

How to match the performance of numpy.dot #1

Comments

wweic commented Dec 9, 2017

wweic commented Dec 9, 2017

wweic commented Dec 9, 2017

where numpy's dot implementation

find all symbols with dot

wweic commented Dec 9, 2017 • edited Loading

set breakpoints on all symbols with blas, and wait

The backtrace

wweic commented Dec 9, 2017 • edited Loading

find all symbols with `dot`

wweic commented Dec 9, 2017 •

edited

Loading

set breakpoints on all symbols with `blas`, and wait

wweic commented Dec 9, 2017 •

edited

Loading