Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to match the performance of numpy.dot #1

Open
wweic opened this issue Dec 9, 2017 · 4 comments
Open

How to match the performance of numpy.dot #1

wweic opened this issue Dec 9, 2017 · 4 comments

Comments

@wweic
Copy link
Owner

wweic commented Dec 9, 2017

http://docs.tvmlang.org/tutorials/optimize/opt_gemm.html#sphx-glr-tutorials-optimize-opt-gemm-py

In [27]: %paste
evaluator = func.time_evaluator(func.entry_name, tvm.cpu(0), number = 5)
c = tvm.nd.array(numpy.zeros((N, N), dtype = dtype), tvm.cpu(0))
print('Opt3: %f' % evaluator(a, b, c).mean)

## -- End pasted text --
Opt3: 0.118320

In [28]: %paste
evaluator = func.time_evaluator(func.entry_name, tvm.cpu(0), number = 5)
c = tvm.nd.array(numpy.zeros((N, N), dtype = dtype), tvm.cpu(0))
print('Opt3: %f' % evaluator(a, b, c).mean)

## -- End pasted text --
Opt3: 0.122024

In [29]: %paste
_a = a.asnumpy()
_b = b.asnumpy()
now = time.clock()
answer = numpy.dot(_a, _b)
print("Numpy: %f" % (time.clock() - now))

## -- End pasted text --
Numpy: 0.089359

In [30]: %paste
_a = a.asnumpy()
_b = b.asnumpy()
now = time.clock()
answer = numpy.dot(_a, _b)
print("Numpy: %f" % (time.clock() - now))

## -- End pasted text --
Numpy: 0.072706

There is 50% performance difference.

@wweic
Copy link
Owner Author

wweic commented Dec 9, 2017

My numpy opt config:

In [33]: np.__config__.show()
blas_mkl_info:
  NOT AVAILABLE
blis_info:
  NOT AVAILABLE
openblas_info:
  NOT AVAILABLE
atlas_3_10_blas_threads_info:
  NOT AVAILABLE
atlas_3_10_blas_info:
  NOT AVAILABLE
atlas_blas_threads_info:
  NOT AVAILABLE
atlas_blas_info:
  NOT AVAILABLE
blas_opt_info:
    extra_compile_args = ['-msse3', '-I/System/Library/Frameworks/vecLib.framework/Headers']
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
    define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]
lapack_mkl_info:
  NOT AVAILABLE
openblas_lapack_info:
  NOT AVAILABLE
atlas_3_10_threads_info:
  NOT AVAILABLE
atlas_3_10_info:
  NOT AVAILABLE
atlas_threads_info:
  NOT AVAILABLE
atlas_info:
  NOT AVAILABLE
lapack_opt_info:
    extra_compile_args = ['-msse3']
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
    define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]

@wweic
Copy link
Owner Author

wweic commented Dec 9, 2017

where numpy's dot implementation

find all symbols with dot

(lldb) image lookup -r -n dot
2 matches found in /usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/Python:
        Address: Python[0x0000000000100767] (Python.__TEXT.__text + 1044391)
        Summary: Python`dotjoinattr        Address: Python[0x0000000000100884] (Python.__TEXT.__text + 1044676)
        Summary: Python`dotted_getattr
1 match found in /usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/lib-dynload/_pickle.cpython-36m-darwin.so:
        Address: _pickle.cpython-36m-darwin.so[0x0000000000005ed7] (_pickle.cpython-36m-darwin.so.__TEXT.__text + 19707)
        Summary: _pickle.cpython-36m-darwin.so`get_dotted_path
5 matches found in /usr/local/opt/sqlite/lib/libsqlite3.0.dylib:
        Address: libsqlite3.0.dylib[0x0000000000018032] (libsqlite3.0.dylib.__TEXT.__text + 93662)
        Summary: libsqlite3.0.dylib`dotlockClose        Address: libsqlite3.0.dylib[0x000000000001805c] (libsqlite3.0.dylib.__TEXT.__text + 93704)
        Summary: libsqlite3.0.dylib`dotlockLock        Address: libsqlite3.0.dylib[0x00000000000180eb] (libsqlite3.0.dylib.__TEXT.__text + 93847)
        Summary: libsqlite3.0.dylib`dotlockUnlock        Address: libsqlite3.0.dylib[0x000000000001813c] (libsqlite3.0.dylib.__TEXT.__text + 93928)
        Summary: libsqlite3.0.dylib`dotlockCheckReservedLock        Address: libsqlite3.0.dylib[0x0000000000018171] (libsqlite3.0.dylib.__TEXT.__text + 93981)
        Summary: libsqlite3.0.dylib`dotlockIoFinderImpl
28 matches found in /usr/local/lib/python3.6/site-packages/numpy/core/multiarray.cpython-36m-darwin.so:
        Address: multiarray.cpython-36m-darwin.so[0x00000000000063b0] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 20912)
        Summary: multiarray.cpython-36m-darwin.so`FLOAT_dot        Address: multiarray.cpython-36m-darwin.so[0x0000000000006560] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 21344)
        Summary: multiarray.cpython-36m-darwin.so`DOUBLE_dot        Address: multiarray.cpython-36m-darwin.so[0x0000000000006710] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 21776)
        Summary: multiarray.cpython-36m-darwin.so`CFLOAT_dot        Address: multiarray.cpython-36m-darwin.so[0x00000000000068e0] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 22240)
        Summary: multiarray.cpython-36m-darwin.so`CDOUBLE_dot        Address: multiarray.cpython-36m-darwin.so[0x000000000001a940] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 104256)
        Summary: multiarray.cpython-36m-darwin.so`TIMEDELTA_dot        Address: multiarray.cpython-36m-darwin.so[0x000000000001d9a0] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 116640)
        Summary: multiarray.cpython-36m-darwin.so`DATETIME_dot        Address: multiarray.cpython-36m-darwin.so[0x000000000001f770] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 124272)
        Summary: multiarray.cpython-36m-darwin.so`OBJECT_dot        Address: multiarray.cpython-36m-darwin.so[0x0000000000020f90] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 130448)
        Summary: multiarray.cpython-36m-darwin.so`CLONGDOUBLE_dot        Address: multiarray.cpython-36m-darwin.so[0x0000000000027ad0] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 157904)
        Summary: multiarray.cpython-36m-darwin.so`LONGDOUBLE_dot        Address: multiarray.cpython-36m-darwin.so[0x000000000002ef00] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 187648)
        Summary: multiarray.cpython-36m-darwin.so`HALF_dot        Address: multiarray.cpython-36m-darwin.so[0x00000000000321e0] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 200672)
        Summary: multiarray.cpython-36m-darwin.so`ULONGLONG_dot        Address: multiarray.cpython-36m-darwin.so[0x0000000000035180] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 212864)
        Summary: multiarray.cpython-36m-darwin.so`LONGLONG_dot        Address: multiarray.cpython-36m-darwin.so[0x0000000000038660] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 226400)
        Summary: multiarray.cpython-36m-darwin.so`ULONG_dot        Address: multiarray.cpython-36m-darwin.so[0x000000000003b600] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 238592)
        Summary: multiarray.cpython-36m-darwin.so`LONG_dot        Address: multiarray.cpython-36m-darwin.so[0x000000000003e4a0] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 250528)
        Summary: multiarray.cpython-36m-darwin.so`UINT_dot        Address: multiarray.cpython-36m-darwin.so[0x0000000000041250] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 262224)
        Summary: multiarray.cpython-36m-darwin.so`INT_dot        Address: multiarray.cpython-36m-darwin.so[0x0000000000043fe0] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 273888)
        Summary: multiarray.cpython-36m-darwin.so`USHORT_dot        Address: multiarray.cpython-36m-darwin.so[0x0000000000046db0] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 285616)
        Summary: multiarray.cpython-36m-darwin.so`SHORT_dot        Address: multiarray.cpython-36m-darwin.so[0x0000000000049a30] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 297008)
        Summary: multiarray.cpython-36m-darwin.so`UBYTE_dot        Address: multiarray.cpython-36m-darwin.so[0x000000000004c560] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 308064)
        Summary: multiarray.cpython-36m-darwin.so`BYTE_dot        Address: multiarray.cpython-36m-darwin.so[0x000000000004f430] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 320048)
        Summary: multiarray.cpython-36m-darwin.so`BOOL_dot        Address: multiarray.cpython-36m-darwin.so[0x000000000005c450] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 373328)
        Summary: multiarray.cpython-36m-darwin.so`dot_alignment_error        Address: multiarray.cpython-36m-darwin.so[0x00000000000e4c80] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 932480)
        Summary: multiarray.cpython-36m-darwin.so`array_dot        Address: multiarray.cpython-36m-darwin.so[0x00000000000ee250] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 970832)
        Summary: multiarray.cpython-36m-darwin.so`array_vdot        Address: multiarray.cpython-36m-darwin.so[0x000000000011c620] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 1160224)
        Summary: multiarray.cpython-36m-darwin.so`CFLOAT_vdot        Address: multiarray.cpython-36m-darwin.so[0x000000000011c7f0] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 1160688)
        Summary: multiarray.cpython-36m-darwin.so`CDOUBLE_vdot        Address: multiarray.cpython-36m-darwin.so[0x000000000011c970] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 1161072)
        Summary: multiarray.cpython-36m-darwin.so`CLONGDOUBLE_vdot        Address: multiarray.cpython-36m-darwin.so[0x000000000011c9e0] (multiarray.cpython-36m-darwin.so.__TEXT.__text + 1161184)
        Summary: multiarray.cpython-36m-darwin.so`OBJECT_vdot
14 matches found in /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libvDSP.dylib:
        Address: libvDSP.dylib[0x00000000000741b0] (libvDSP.dylib.__TEXT.__text + 472688)
        Summary: libvDSP.dylib`vDSP_dotpr        Address: libvDSP.dylib[0x0000000000010110] (libvDSP.dylib.__TEXT.__text + 62928)
        Summary: libvDSP.dylib`vDSP_dotpr2        Address: libvDSP.dylib[0x0000000000192400] (libvDSP.dylib.__TEXT.__text + 1644736)
        Summary: libvDSP.dylib`vDSP_dotpr2D        Address: libvDSP.dylib[0x00000000000101b0] (libvDSP.dylib.__TEXT.__text + 63088)
        Summary: libvDSP.dylib`vDSP_dotpr2_s1_15        Address: libvDSP.dylib[0x0000000000011840] (libvDSP.dylib.__TEXT.__text + 68864)
        Summary: libvDSP.dylib`vDSP_dotpr2_s8_24        Address: libvDSP.dylib[0x0000000000075280] (libvDSP.dylib.__TEXT.__text + 476992)
        Summary: libvDSP.dylib`vDSP_dotprD        Address: libvDSP.dylib[0x0000000000011b90] (libvDSP.dylib.__TEXT.__text + 69712)
        Summary: libvDSP.dylib`vDSP_dotpr_s1_15        Address: libvDSP.dylib[0x0000000000011e10] (libvDSP.dylib.__TEXT.__text + 70352)
        Summary: libvDSP.dylib`vDSP_dotpr_s8_24        Address: libvDSP.dylib[0x000000000013e310] (libvDSP.dylib.__TEXT.__text + 1300432)
        Summary: libvDSP.dylib`vDSP_zdotpr        Address: libvDSP.dylib[0x00000000001a4ee0] (libvDSP.dylib.__TEXT.__text + 1721248)
        Summary: libvDSP.dylib`vDSP_zdotprD        Address: libvDSP.dylib[0x00000000001a5060] (libvDSP.dylib.__TEXT.__text + 1721632)
        Summary: libvDSP.dylib`vDSP_zidotpr        Address: libvDSP.dylib[0x00000000001a51e0] (libvDSP.dylib.__TEXT.__text + 1722016)
        Summary: libvDSP.dylib`vDSP_zidotprD        Address: libvDSP.dylib[0x000000000013ef10] (libvDSP.dylib.__TEXT.__text + 1303504)
        Summary: libvDSP.dylib`vDSP_zrdotpr        Address: libvDSP.dylib[0x00000000001a5bf0] (libvDSP.dylib.__TEXT.__text + 1724592)
        Summary: libvDSP.dylib`vDSP_zrdotprD
3 matches found in /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libvMisc.dylib:
        Address: libvMisc.dylib[0x0000000000009db0] (libvMisc.dylib.__TEXT.__text + 36560)
        Summary: libvMisc.dylib`vSdot        Address: libvMisc.dylib[0x0000000000009e30] (libvMisc.dylib.__TEXT.__text + 36688)
        Summary: libvMisc.dylib`vSndot        Address: libvMisc.dylib[0x0000000000000ee0] (libvMisc.dylib.__TEXT.__text + 0)
        Summary: libvMisc.dylib`vec_sdot
36 matches found in /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib:
        Address: libBLAS.dylib[0x0000000000073ea8] (libBLAS.dylib.__TEXT.__text + 472232)
        Summary: libBLAS.dylib`APL_zdotu_stride11_AVX256        Address: libBLAS.dylib[0x0000000000073fe0] (libBLAS.dylib.__TEXT.__text + 472544)
        Summary: libBLAS.dylib`APL_zdotu_AVX128        Address: libBLAS.dylib[0x00000000000740c6] (libBLAS.dylib.__TEXT.__text + 472774)
        Summary: libBLAS.dylib`APL_zdotu_stride11_AVX128        Address: libBLAS.dylib[0x0000000000076514] (libBLAS.dylib.__TEXT.__text + 482068)
        Summary: libBLAS.dylib`sdot_forwardKernel_AVX512        Address: libBLAS.dylib[0x000000000007670d] (libBLAS.dylib.__TEXT.__text + 482573)
        Summary: libBLAS.dylib`sdot_backwardKernel_AVX512        Address: libBLAS.dylib[0x000000000009be2c] (libBLAS.dylib.__TEXT.__text + 635948)
        Summary: libBLAS.dylib`APL_zdotc_stride11_AVX256        Address: libBLAS.dylib[0x000000000009c138] (libBLAS.dylib.__TEXT.__text + 636728)
        Summary: libBLAS.dylib`APL_zdotc_AVX128        Address: libBLAS.dylib[0x000000000009c21b] (libBLAS.dylib.__TEXT.__text + 636955)
        Summary: libBLAS.dylib`APL_zdotc_stride11_AVX128        Address: libBLAS.dylib[0x00000000000a9164] (libBLAS.dylib.__TEXT.__text + 690020)
        Summary: libBLAS.dylib`dsdot_forwardKernel_AVX512        Address: libBLAS.dylib[0x00000000000a936f] (libBLAS.dylib.__TEXT.__text + 690543)
        Summary: libBLAS.dylib`dsdot_backwardKernel_AVX512        Address: libBLAS.dylib[0x00000000001111a6] (libBLAS.dylib.__TEXT.__text + 1116070)
        Summary: libBLAS.dylib`ddot_forwardKernel_AVX512        Address: libBLAS.dylib[0x0000000000111381] (libBLAS.dylib.__TEXT.__text + 1116545)
        Summary: libBLAS.dylib`ddot_backwardKernel_AVX512        Address: libBLAS.dylib[0x000000000006a77a] (libBLAS.dylib.__TEXT.__text + 433530)
        Summary: libBLAS.dylib`cblas_cdotc_sub        Address: libBLAS.dylib[0x000000000007310b] (libBLAS.dylib.__TEXT.__text + 468747)
        Summary: libBLAS.dylib`cblas_cdotu_sub        Address: libBLAS.dylib[0x000000000007693c] (libBLAS.dylib.__TEXT.__text + 483132)
        Summary: libBLAS.dylib`cblas_ddot        Address: libBLAS.dylib[0x0000000000023a79] (libBLAS.dylib.__TEXT.__text + 143481)
        Summary: libBLAS.dylib`cblas_dsdot        Address: libBLAS.dylib[0x0000000000000d94] (libBLAS.dylib.__TEXT.__text + 916)
        Summary: libBLAS.dylib`cblas_sdot        Address: libBLAS.dylib[0x0000000000023a54] (libBLAS.dylib.__TEXT.__text + 143444)
        Summary: libBLAS.dylib`cblas_sdsdot        Address: libBLAS.dylib[0x000000000003be9f] (libBLAS.dylib.__TEXT.__text + 242847)
        Summary: libBLAS.dylib`cblas_zdotc_sub        Address: libBLAS.dylib[0x000000000011330d] (libBLAS.dylib.__TEXT.__text + 1124621)
        Summary: libBLAS.dylib`cblas_zdotu_sub        Address: libBLAS.dylib[0x0000000000006c1a] (libBLAS.dylib.__TEXT.__text + 25114)
        Summary: libBLAS.dylib`CDOTC        Address: libBLAS.dylib[0x0000000000006c1a] (libBLAS.dylib.__TEXT.__text + 25114)
        Summary: libBLAS.dylib`CDOTC        Address: libBLAS.dylib[0x0000000000006c3c] (libBLAS.dylib.__TEXT.__text + 25148)
        Summary: libBLAS.dylib`CDOTU        Address: libBLAS.dylib[0x0000000000006c3c] (libBLAS.dylib.__TEXT.__text + 25148)
        Summary: libBLAS.dylib`CDOTU        Address: libBLAS.dylib[0x0000000000006d52] (libBLAS.dylib.__TEXT.__text + 25426)
        Summary: libBLAS.dylib`DDOT        Address: libBLAS.dylib[0x0000000000006d52] (libBLAS.dylib.__TEXT.__text + 25426)
        Summary: libBLAS.dylib`DDOT        Address: libBLAS.dylib[0x0000000000006e11] (libBLAS.dylib.__TEXT.__text + 25617)
        Summary: libBLAS.dylib`DSDOT        Address: libBLAS.dylib[0x0000000000006e11] (libBLAS.dylib.__TEXT.__text + 25617)
        Summary: libBLAS.dylib`DSDOT        Address: libBLAS.dylib[0x0000000000006e88] (libBLAS.dylib.__TEXT.__text + 25736)
        Summary: libBLAS.dylib`SDOT        Address: libBLAS.dylib[0x0000000000006e88] (libBLAS.dylib.__TEXT.__text + 25736)
        Summary: libBLAS.dylib`SDOT        Address: libBLAS.dylib[0x0000000000006e9e] (libBLAS.dylib.__TEXT.__text + 25758)
        Summary: libBLAS.dylib`SDSDOT        Address: libBLAS.dylib[0x0000000000006e9e] (libBLAS.dylib.__TEXT.__text + 25758)
        Summary: libBLAS.dylib`SDSDOT        Address: libBLAS.dylib[0x0000000000006fba] (libBLAS.dylib.__TEXT.__text + 26042)
        Summary: libBLAS.dylib`ZDOTC        Address: libBLAS.dylib[0x0000000000006fba] (libBLAS.dylib.__TEXT.__text + 26042)
        Summary: libBLAS.dylib`ZDOTC        Address: libBLAS.dylib[0x0000000000006fdc] (libBLAS.dylib.__TEXT.__text + 26076)
        Summary: libBLAS.dylib`ZDOTU        Address: libBLAS.dylib[0x0000000000006fdc] (libBLAS.dylib.__TEXT.__text + 26076)
        Summary: libBLAS.dylib`ZDOTU
2 matches found in /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libSparse.dylib:
        Address: libSparse.dylib[0x000000000003c9d0] (libSparse.dylib.__TEXT.__text + 242000)
        Summary: libSparse.dylib`libmetis__idot        Address: libSparse.dylib[0x000000000003cea7] (libSparse.dylib.__TEXT.__text + 243239)
        Summary: libSparse.dylib`libmetis__rdot
2 matches found in /Users/weichen/workspace/deep-learning/tvm/lib/libtvm.dylib:
        Address: libtvm.dylib[0x00000000015e025a] (libtvm.dylib.__TEXT.__text + 22926778)
        Summary: libtvm.dylib`llvm::sys::path::remove_dots(llvm::SmallVectorImpl<char>&, bool, llvm::sys::path::Style)        Address: libtvm.dylib[0x00000000015e01fe] (libtvm.dylib.__TEXT.__text + 22926686)
        Summary: libtvm.dylib`llvm::sys::path::remove_leading_dotslash(llvm::StringRef, llvm::sys::path::Style)
(lldb)

@wweic
Copy link
Owner Author

wweic commented Dec 9, 2017

set breakpoints on all symbols with blas, and wait

(lldb) br s -r  blas
Breakpoint 2: 160 locations.

The backtrace

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.97
  * frame #0: 0x00007fff53ed1cc0 libBLAS.dylib`cblas_sgemm
    frame #1: 0x000000010748f0f4 multiarray.cpython-36m-darwin.so`cblas_matrixproduct + 3700
    frame #2: 0x00000001074583c7 multiarray.cpython-36m-darwin.so`PyArray_MatrixProduct2 + 215
    frame #3: 0x000000010745d20f multiarray.cpython-36m-darwin.so`array_matrixproduct + 191
    frame #4: 0x0000000105071fba Python`_PyCFunction_FastCallDict + 468
    frame #5: 0x00000001050db792 Python`call_function + 584
    frame #6: 0x00000001050d8bab Python`_PyEval_EvalFrameDefault + 23930
    frame #7: 0x00000001050dbfb4 Python`_PyEval_EvalCodeWithName + 1978
    frame #8: 0x00000001050d2d88 Python`PyEval_EvalCode + 100
    frame #9: 0x00000001050d0743 Python`builtin_exec + 531
    frame #10: 0x0000000105071e92 Python`_PyCFunction_FastCallDict + 172
    frame #11: 0x00000001050db792 Python`call_function + 584
    frame #12: 0x00000001050d8bab Python`_PyEval_EvalFrameDefault + 23930
    frame #13: 0x00000001050dbfb4 Python`_PyEval_EvalCodeWithName + 1978
    frame #14: 0x00000001050dc6cf Python`fast_function + 241
    frame #15: 0x00000001050db765 Python`call_function + 539
    frame #16: 0x00000001050d8bab Python`_PyEval_EvalFrameDefault + 23930
    frame #17: 0x00000001050dbfb4 Python`_PyEval_EvalCodeWithName + 1978
    frame #18: 0x00000001050dc6cf Python`fast_function + 241
    frame #19: 0x00000001050db765 Python`call_function + 539
    frame #20: 0x00000001050d8c31 Python`_PyEval_EvalFrameDefault + 24064
    frame #21: 0x00000001050dbfb4 Python`_PyEval_EvalCodeWithName + 1978
    frame #22: 0x00000001050dc6cf Python`fast_function + 241
    frame #23: 0x00000001050db765 Python`call_function + 539
    frame #24: 0x00000001050d8c31 Python`_PyEval_EvalFrameDefault + 24064
    frame #25: 0x00000001050dbfb4 Python`_PyEval_EvalCodeWithName + 1978
    frame #26: 0x00000001050dc6cf Python`fast_function + 241
    frame #27: 0x00000001050db765 Python`call_function + 539
    frame #28: 0x00000001050d8bab Python`_PyEval_EvalFrameDefault + 23930
    frame #29: 0x00000001050dbfb4 Python`_PyEval_EvalCodeWithName + 1978
    frame #30: 0x00000001050dc6cf Python`fast_function + 241
    frame #31: 0x00000001050db765 Python`call_function + 539
    frame #32: 0x00000001050d8bab Python`_PyEval_EvalFrameDefault + 23930
    frame #33: 0x00000001050dc96b Python`_PyFunction_FastCall + 122
    frame #34: 0x00000001050db765 Python`call_function + 539
    frame #35: 0x00000001050d8bab Python`_PyEval_EvalFrameDefault + 23930
    frame #36: 0x00000001050dbfb4 Python`_PyEval_EvalCodeWithName + 1978
    frame #37: 0x00000001050dc8b9 Python`_PyFunction_FastCallDict + 477
    frame #38: 0x0000000105037a63 Python`_PyObject_FastCallDict + 231
    frame #39: 0x0000000105037b83 Python`_PyObject_Call_Prepend + 149
    frame #40: 0x00000001050378bb Python`PyObject_Call + 102
    frame #41: 0x00000001050d8e06 Python`_PyEval_EvalFrameDefault + 24533
    frame #42: 0x00000001050dbfb4 Python`_PyEval_EvalCodeWithName + 1978
    frame #43: 0x00000001050dc6cf Python`fast_function + 241
    frame #44: 0x00000001050db765 Python`call_function + 539
    frame #45: 0x00000001050d8bab Python`_PyEval_EvalFrameDefault + 23930
    frame #46: 0x00000001050dbfb4 Python`_PyEval_EvalCodeWithName + 1978
    frame #47: 0x00000001050d2d88 Python`PyEval_EvalCode + 100
    frame #48: 0x00000001050fc3a6 Python`run_mod + 58
    frame #49: 0x00000001050fc6bb Python`PyRun_FileExFlags + 178
    frame #50: 0x00000001050fbd54 Python`PyRun_SimpleFileExFlags + 676
    frame #51: 0x00000001051104cc Python`Py_Main + 3472
    frame #52: 0x0000000105027e17 Python`___lldb_unnamed_symbol1$$Python + 235
    frame #53: 0x00007fff7e9c5145 libdyld.dylib`start + 1

@wweic
Copy link
Owner Author

wweic commented Dec 9, 2017

wweic pushed a commit that referenced this issue Jul 10, 2018
wweic pushed a commit that referenced this issue Jul 10, 2018
* updates (#1)

* add scalars

* change format

* change inferattr interface

* remove scalar

* remove warning
wweic pushed a commit that referenced this issue Sep 25, 2018
wweic pushed a commit that referenced this issue Sep 25, 2018
* updates (#1)

* add scalars

* change format

* change inferattr interface

* remove scalar

* remove warning
wweic pushed a commit that referenced this issue Feb 4, 2019
wweic pushed a commit that referenced this issue Mar 12, 2019
Fix Windows build for Neo DLR
wweic pushed a commit that referenced this issue Mar 30, 2019
wweic pushed a commit that referenced this issue Apr 15, 2019
wweic pushed a commit that referenced this issue Apr 22, 2019
wweic pushed a commit that referenced this issue Nov 16, 2020
…generating (apache#5962)

* Code migration Start (#1)

* Init commit: Code migration Start

* Add loop_state.cc/h

* Add ComputeDAG basic test

* Split transform_step out & Update more UTs (apache#3)

* Split transform_step out

* Update GetProducers & GetConsumers

* Update UTs

* Add UT for CacheReadWrite & Some bug fix

* Add search_task, measure and serialization (apache#4)

* Add FollowSplit & FollowFusedSplit tests

* Update dag.InferBound & its UT

* Add search_task, measure and serialization

* Update Serialization UT

* Add MetaTileRewritePolicy (apache#5)

* Add feature

* Add cost_model, meta_tile_rewrite_policy

* Add MetaTileRewritePolicy basic UT

* Basic Python API for State (apache#6)

* Add Basic Python API for State

* Add UTs for State

* Add Python API: Measure & Task (apache#7)

* Update the return value of state operation

* Add task

* Copy measure.py & utils.py

* Fix LocalBuilder

* Fix LocalRunner

* Add ansor.auto_schedule() API; First AutoSchedule working version(apache#8)

* Add basic Python support for ansor.auto_schedule

* Update AutoSchedule API

* Bug fix for get the attach point of a fused iter

* Update UT after infer bug fix

* Bug fix & Add python serialization API (apache#10)

* Delete C++ UT hack since Python is ready

* Add ndarray.non_empty

* Update Serialization python API

* Improve code style, python wrapper and test cases (apache#11)

* Update c++ code style and unit test

* Update python State wrapper and test cases

* fix unit tests

* Add RPCRunner & OpenCL/CUDA test (apache#12)

* Add RPCRunner & OpenCL search test

* Add CUDA search test

* Add RPCRunner test

* rebase to upstream/master

* Add Ansor basic tutorial (apache#13)

* Add basic tutorial

* migrate feature extraction (apache#14)

* Add XGBModel & RPCRunnerWarpper (apache#15)

* Add XGBModel & RPCRunnerWarpper

* Revert "Add Parallel Granularity Mutation"

* Migrate workload_registry.py (apache#16)

* add workload registry

* update

* update

* add task scheduler (apache#17)

* Add conv2d cuda tutorial with workload registry (apache#18)

* add tune_test.py (the old tune_wkl.py) (apache#19)

* add tune_test.py (the old tune_wkl.py)

* update

* fix measure

* fix for gpu

* Code refine for tune_test.py & Add a pre load callback (apache#20)

* Bug fix for tutorials

* Add PreLoadMeasuredStates

* Add search_callback support for task tuner

* Code refine for tune_test.py

* Update

* Update

* Update

* Update

* Bug fix

* Add python custom sketch rule (apache#21)

* Add custom sketch rule

* Bug fix

* Ansor Relay Integration (without layout rewrite) (apache#22)

* relay integration

* Add tune_op_subgraph.py & Some code clean for tune_network.py (apache#23)

* Add single op tune scripts

* Add tune subgraph support

* Merge all op & all subgraph to one file

* Rename file

* add explicit_unroll_max_extent (apache#25)

* Add Index simplification & API update (apache#26)

* Add vectorized cooperative_fetching test

* Update math simplify for vectorized CF

* File rename

* Update tune_network

* API update

* Update PreLoadMeasuredStates & Some bug fix (apache#27)

* Add a threading wrapper to fix the test bug

* Set default TVM_USE_AUTO_SCHEDULER to false

* Update PreLoadMeasuredStates callback

* Add tensorize step for loop_state (apache#31)

* Add tensorize step

* State python api update (apache#33)

* Start to update api

* Add compute_dag to state

* API update

* kernel layout rewrite (apache#28)

* kernel layout rewrite

* remove some hacks

* add defuse_ops pass and move kernel_layout_rewrite pass after fuse_ops pass

* set TVM_RELAY_DISABLE_BUILD_CACHE for task extraction and prepare_layout_rewrite

* [cache flush] port cache flush to ansor (apache#32)

* Improve relay integration (apache#34)

* tmp checkpoint

* Improve relay integration

* Improve relay integration

* Fix xgb error & Simplify dispatcher (apache#35)

* Rename "MetaTileRewritePolicy" to "SketchPolicy". (apache#36)

* Rename "MetaTileRewritePolicy" to "SketchPolicy".

* Add a new class for auto_unroll_max_step, storage_offset in StageNode

* fix tune_op_subgraph.py

* rebase

* Migrate all node::make to noderef's construct function (apache#37)

* Start to move xxxnode::make to noderef()

* Update

* Update

* Finish transform_step

* Finish comute dag & auto schedule

* Update

* Update

* Update

* Update

* Update

* Code refine

* Code refine

* Code refine

* Update

* Update

* Some lint fix & Recover the double constructor of tvm::PrimExpr (apache#39)

* lint fix

* clang-format-fix

* pylint fix

* Update

* Recover the double constructor of tvm::PrimExpr

* Fix pylint

* pylint fix

* pylint fix

* Add MutateComputeLocation and MutateParallel in evolutionary search (apache#40)

* Add MutateComputeLocation and MutateParallel in evolutionary search

* fix lint

* Improve loop state python API (stage_tensors -> stage_ops) (apache#41)

* improve loop state python API (stage_tensors -> stage_ops)

* fix

* ComputeDAG bug fix & Add Custom TensorCore Matmul Example (apache#42)

* Bug Fix

* Sample example of Custom TensorCore Matmul

* Rever Commits, Start to build minimum Ansor system

* Code clean for minimum Ansor system

* Bug fix & Delete AccessAnalyzer

* Delete attachmap & Code clean

* Doc update

Update statenode::stages from vector to Array

* Headfile update & Python doc update

* clang-format fix

* pylint fix

* Update

* Doc update

* Update

* Bug fix after code merge to the new master

* clang-format fix

* Update

* Update

* Update std::vector to Array; Update verbosity setting; Some commemts
addressed

* std::vector->Array & std::string->String

* Add init_state to ComputeDAG

* Update

* Update some unordered_map to Map

* clang-format fix

* Comments addressed
Delete ReplayAndInferBound
Delete ReplaySteps & InferBoundCommon

* Lint fix

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Rename ansor namespace to auto_schedule

* Update

* Rename ThreadPool to ParallelFor

* Add parallel_for

* Remove ThreadPool

* Update python/tvm/auto_schedule/auto_schedule.py

* trigger CI

Co-authored-by: Lianmin Zheng <[email protected]>
Co-authored-by: Minmin Sun (孙敏敏) <[email protected]>
Co-authored-by: Zhao Wu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant