Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TVM v0.5 Roadmap #1596

Closed
20 of 32 tasks
tqchen opened this issue Aug 13, 2018 · 27 comments
Closed
20 of 32 tasks

TVM v0.5 Roadmap #1596

tqchen opened this issue Aug 13, 2018 · 27 comments
Milestone

Comments

@tqchen
Copy link
Member

tqchen commented Aug 13, 2018

This roadmap for TVM v0.5. TVM is a community-driven project and we love your feedback and proposals on where we should be heading. Please open up discussion in the discussion forum as well as bring RFCs.

  • Feel free to volunteer yourself if you are interested in trying out some items(they do not have to be on the list).
  • Please also check out the help wanted list in the github issues on things that need help

Features

  • Fully featured 8-bit network support
    • 8bit quantizer
    • arbibtary bits quantization algorithm
    • ARM support
    • Intel cpu support
  • NVidia GPU 8-bit kernel
    • int8 gemm recipe
    • int8 conv2d
    • autotvm integration
  • Automated tuning and scheduling
    • AutoTVM optimizations for mobile GPUs
    • AutoTVM optimizations for CUDA
    • AutoTVM for x86
    • graph level automated optimization
  • Ultra low-bit support
    • tutorials of low-bit ops
    • customized accelerator support
  • VTA enhancements
    • support generic high level models
    • Enhanced operator/model coverage
    • Ultra-96, ZCU102 support
    • Amazon F1 preliminary support
    • Low-bit support, bit serial support
    • Chisel version
  • High level IR improvements
    • A more coupled design with tvm runtime system
    • support control flows
    • Type system support
  • Runtime
    • Hetrogenuous runtime
  • Micro-asm kernel exploration
    • Core micro-asm primitives for certain ops
  • Hybrid python programming model
    • transition of vision operators to hybrid mode.
  • RPC and Device API
    • Support a c++ version of cross platform RPC
  • Security
    • tutorials on how to use SGX backend
  • Tutorials and docs
    • How to write a pass in python
    • General lowering flow of TVM
  • Language runtime
    • Golang runtime
    • Rust support
      • rust runtime
      • rust frontend
This was referenced Aug 13, 2018
@yzhliu
Copy link
Member

yzhliu commented Aug 13, 2018

Shall we add heterogeneous graph runtime? @zhiics is working on that.

@anijain2305
Copy link
Contributor

I am interested in implementing the Intel CPU support for INT8 quantization

@siju-samuel
Copy link
Member

I'm interested in implementing the RUST runtime.

@ehsanmok
Copy link
Contributor

ehsanmok commented Aug 14, 2018

@tqchen @siju-samuel My Rust runtime (dylib) support which follows the same generic API as Java for example (CPU, GPU, etc.) is 70%-ish done! I'll need to finish the callback support, add docs and cleanup. Any contributions is welcomed!

@nhynes Rust static support is in a good shape as well but is specific to CPU with custom allocator etc.

@siju-samuel
Copy link
Member

@ehsanmok OK
Anyone doing "Support a c++ version of cross platform RPC"? If not, I'm interested in taking up this.

@PariksheetPinjari909
Copy link
Contributor

@tqchen I have started working 8 bit quantizer and its operator support for conv2d, dense and relu. To avoid duplicate work pls let me know if anyone else is doing this work.

@nhynes
Copy link
Member

nhynes commented Aug 14, 2018

PR for static Rust runtime in #1597.

@ehsanmok I'm not sure what you mean by "custom allocator etc." It uses whatever GlobalAlloc you care to use.

@ehsanmok
Copy link
Contributor

ehsanmok commented Aug 14, 2018

@nhynes I meant you've defined your own allocator, threading, parallel backend support for CPU usage only for staticlib compiling with xargo while I've taken different route relying on existing layeouts for example and seems working for GPU. Though I admit I've done the project for my own enrichment first.

@tqchen
Copy link
Member Author

tqchen commented Aug 14, 2018

@PariksheetPinjari909 the UW SAML team is working on a generic n-bit quantizer and hopefully things will get RFCed and upstreamed in this release cycle

@tqchen
Copy link
Member Author

tqchen commented Aug 14, 2018

Please feel free to open new issues to track the working items, @siju-samuel standalone RPC is tracked by #1496

@tqchen
Copy link
Member Author

tqchen commented Aug 14, 2018

The first post contains an initial list of things based on the community feedback, please also feel free to propose new things and we will add it to the roadmap

@nhynes
Copy link
Member

nhynes commented Aug 14, 2018

Will the new graph runtime make it into this release? I'd love to upstream some training codes, but they all depend on the semi-kluge FExpandCompute.

@tqchen
Copy link
Member Author

tqchen commented Aug 14, 2018

@nhynes it belongs to the "high-level IR improvements"

@tqchen tqchen added this to the v0.5 milestone Aug 14, 2018
@PariksheetPinjari909
Copy link
Contributor

@tqchen Ok. Let me know what support i can give in 8 bit quantization. I am interested to contribute here.

@PariksheetPinjari909
Copy link
Contributor

I would like to take up the control flow ops. Let me know if someone is working on that.

@tqchen
Copy link
Member Author

tqchen commented Aug 14, 2018

@PariksheetPinjari909 We will make a major RFC to upgrade the IR system including control flow ops and type system, and after the first phase proposal is done, everyone is welcomed to contribute

@kazum
Copy link
Contributor

kazum commented Aug 16, 2018

Sorry for being late. I’d like to add preliminary support for HLS shecudler to allow compiling actual neural networks with AOCL and SDAccel backends.

@tqchen
Copy link
Member Author

tqchen commented Aug 21, 2018

int8 cuda gemm recipe #1614

@JammyZhou
Copy link
Contributor

@tqchen from TVM perspective, any comments on ONNXIFI? I'm thinking about how TVM stack can fit into it.

@ajtulloch
Copy link
Contributor

ajtulloch commented Aug 24, 2018

Re microkernels/tensorization, I've been looking at that stuff the last few months or so. There's some WIP stuff in https://github.com/ajtulloch/tvm/tree/tvm-using-val/tensorize, notably well-tuned assembly versions of:

  • FP32 GEMM kernels (ARMv7, AVX2)
  • Int8 x Int8 -> Int32 GEMM kernels (AVX2, adding ARMv7 shortly)

My hypothesis is that we can get a pretty decent part of the way with just GEMM microkernels for a lot of these dense workloads, but it's to-be-tested currently.

Some examples of using them in GEMM-based convs and for the batch gemm of a minimal F(6x6, 3x3) Winograd (~2-3x faster than current trunk on most configurations for ARMv7) are in that dir as well. For folks interested in the "Micro-asm kernel exploration" and "8-bit network stuff" (esp on CPUs), it'd be good to collaborate :).

@anijain2305
Copy link
Contributor

@ajtulloch I am working on Intel 8-bit Conv implementation using Intel Skylake AVX512 instructions (with the long-term goal of using VNNI instructions). I am not using GEMM-based convolution though. I am starting from NCHWc format direct convolution present in current conv2d topi implementation. I should have some numbers for the conv operator by the next weekend and can share them.

@merrymercy
Copy link
Member

@ajtulloch It will be great if you can send a tutorial or topi recipe

@ajtulloch
Copy link
Contributor

@anijain2305 you might find https://github.com/ajtulloch/tvm/blob/tvm-using-val/tensorize/gemm__avx2.c#L424-L531 or a similar microkernel for AVX512 useful on Skylake (same as MKL-DNN's vpmaddubsw/vpmaddwd/vpaddd sequence on AVX2/AVX512 pre VNNI).

@merrymercy what would be useful to have documented/tutorialized or made into a recipe?

@merrymercy
Copy link
Member

I think making a simple runnable conv2d example and showing its speedup will be very useful.

@FrozenGene
Copy link
Member

+1 to one conv2d runnable example. Besides ARMv7 / AVX2, I think we should also add SSE too. For some embbeding platforms, which would use Intel ATOM processors. However, Intel ATOM processors only support SSE4.2 at most, not AVX2.

@ZihengJiang
Copy link
Contributor

0.5 release note candidate is now up at #2448

@ZihengJiang
Copy link
Contributor

v0.5 is now tagged, next cycle roadmap issue is available at #2623

@apache apache locked as resolved and limited conversation to collaborators Feb 19, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests