Quantization #2

electriclilies · 2021-02-02T23:27:11Z

My tests are still a little bit messy and not done, and I haven't done linting yet.

Also, I'm not sure why I can't automatically merge, since I merged main into the quantization branch..

Also I intend to squash my commits before creating the final PR.

…m into quantization

mbrookhart

First Pass, feel free to ignore anything I said, I'm just trying to understand the whole picture. I'm not sure I 100% grok the requantizer, I think we might need a further explanation of why it's needed.

I think the demos need to move to the tutorials, the tutorials run as part of integration testing.

We should add these directories to the doc generation, i.e. here's the API generation for the pattern language: https://github.com/apache/tvm/blob/main/docs/api/python/relay/dataflow_pattern.rst

Might also make sense to make a quantization doc here: https://github.com/apache/tvm/tree/main/docs

python/tvm/relay/qnn/op/qnn.py

include/tvm/relay/qnn/attrs.h

python/tvm/relay/transform/quantize/_average_max_channel_patterns.py

mbrookhart · 2021-02-04T21:03:31Z

python/tvm/relay/transform/quantize/_average_max_channel_patterns.py

+        data_min_sum = 0
+        data_max_sum = 0
+
+        weight_min_sums = np.zeros(shape=(self.attrs["units"],))
+        weight_max_sums = np.zeros(shape=(self.attrs["units"],))
+
+        while not calibration_info.dataset_manager.is_empty():
+            # Get the original input from dataset manger, run unquantized graph with those inputs
+            image_list, _ = calibration_info.dataset_manager.get_next_batch()
+            unquantized_inputs = calibration_info.get_unquantized_layer_inputs(image_list)
+
+            data = unquantized_inputs[0]
+            weight = unquantized_inputs[1]
+
+            data_min_sum += np.min(data)
+            data_max_sum += np.max(data)
+
+            weight_min_sums += np.min(weight, axis=1)
+            weight_max_sums += np.max(weight, axis=1)
+
+        calibration_info.dataset_manager.reset()
+
+        data_min_avg = data_min_sum / calibration_info.dataset_manager.num_batches()
+        data_max_avg = data_max_sum / calibration_info.dataset_manager.num_batches()
+
+        weight_min_avgs = weight_min_sums / calibration_info.dataset_manager.num_batches()
+        weight_max_avgs = weight_max_sums / calibration_info.dataset_manager.num_batches()
+
+        # Threshold for quantization of an input to a layer is mean(abs(avg_max), abs(avg_min))
+        data_threshold = (np.abs(data_min_avg) + np.abs(data_max_avg)) / 2
+        weight_thresholds = (np.abs(weight_min_avgs) + np.abs(weight_max_avgs)) / 2
+
+        # Since this is a symmetric distribution and we are quantizing to int8, there are 256 bins,
+        # and 128 are positive
+        data_scale = data_threshold / 128
+        weight_scales = weight_thresholds / 128


Similarly, we could probably refactor most of this into a utiltiy

python/tvm/relay/transform/quantize/_calibrater.py

mbrookhart · 2021-02-04T21:15:07Z

python/tvm/relay/transform/quantize/demos/average_mean_quantize_mnist_live.py

@@ -0,0 +1,85 @@
+import tvm


This should probably go in tvm/tutorials/quantization?

python/tvm/relay/transform/quantize/demos/average_mean_quantize_resnet.py

python/tvm/relay/transform/quantize/demos/per_channel_test.py

mbrookhart · 2021-02-04T21:16:43Z

src/relay/qnn/op/dequantize.cc

+  CHECK(out_dtype == DataType::Float(32) || out_dtype == DataType::Int(32))
+    << "out_dtype for dequantize must be float32 or int32, but got " << out_dtype;


I can imagine we might want to relax this in the future, but ti's find for now.

src/relay/transforms/quantize.cc

electriclilies · 2021-02-04T21:31:34Z

Totally agree about the demos, I just haven't removed them from my branch yet.
We need the requantizer for 2 reasons:

We can't use requantize during calibration: The requantize op requires that scales and zero points be constant values, not relay expressions, and it's not clear how to make the requantize op take relay expressions without making it slow. Since we combine scales and zero points when dequantizing a lot, this means that we must use quantize and dequantize directly instead of requantize.
We need requantize to go fast: When we "dequantize" we actually come out of int8, so if we have dequantize -> quantize, we acutally end up going from int8 to int32 back to int8, which makes things slow. Requantize never leaves int8, it just changes the scales and zero points.

mbrookhart

I think this is mostly ready. How are you feeling about it?

mbrookhart · 2021-02-17T20:43:06Z

python/tvm/relay/transform/quantize/_calibration_callback.py

+# class AverageMaxCalibrationCallback(CalibrationCallback):
+#     """Calculates scales by calculating the average of the maxiumum absolute value of the node
+#     we are quantizing. Sets zero points to zero."""
+
+#     def calibrate_pattern(self, calibration_info):
+#         """Sets scales to the average of the maximum absolute value of the node we are quantizing.
+#         Sets zero points to zero.
+
+#         Parameters
+#         ----------
+#         calibration_info : CalibrationInfo
+#             Object containing information needed during calibration.
+
+#         Returns
+#         -------
+#         scale_zp_map : dict of str to value
+#             The map from names of scale and zero point variables to the average of the maximum
+#         """
+#         scale_zp_values = {}
+#         avg_maxs = np.zeros(shape=(len(calibration_info.partition_info.input_scale_zps)))
+#         #num_inputs = calibration_info.dataset_manager.num_batches() * \
+#         #             calibration_info.dataset_manager.batch_size()
+#         num_inputs = calibration_info.dataset_manager.num_batches()
+
+#         while not calibration_info.dataset_manager.is_empty():
+#             # Get the original input from dataset manger, run unquantized graph with those inputs
+#             image_list, _ = calibration_info.dataset_manager.get_next_batch()
+#             unquantized_inputs = calibration_info.get_unquantized_layer_inputs(image_list)
+
+#             # Iterate through scale and zp variables
+#             for i, unquantized_input in enumerate(unquantized_inputs):
+#                 # Calculate the average min, max across each batch
+#                 avg_maxs[i] += np.max(np.abs(unquantized_input)) #/ num_inputs
+
+#         # Since this is a symmetric distribution and we are quantizing to int8,
+#         # there are 256 bins, and 128 are positive
+#         calibration_info.dataset_manager.reset()
+#         scales = avg_maxs / (num_inputs * 128)
+
+#         for i, scale_value in enumerate(scales):
+#             scale_name = calibration_info.partition_info.input_scale_zps[i][0].name_hint
+#             scale_zp_values[scale_name] = np.array(scale_value).astype("float32")
+#             zp_name = calibration_info.partition_info.input_scale_zps[i][1].name_hint
+#             scale_zp_values[zp_name] = np.array(0).astype("int32")
+
+#         return scale_zp_values


Commented code?

mbrookhart · 2021-02-17T20:47:14Z

python/tvm/relay/transform/quantize/demos/average_mean_quantize_bert.py

+    #print("Int8 labels: ", q_predicted_labels)
+    #print("Float32 labels: ", predicted_labels)
+    #print("Actual labels: ", label)


Remove Commented Code

electriclilies · 2021-02-17T20:58:41Z

@mbrookhart I feel good about it. I have written a bunch more tests which uncovered a lot of small bugs. I have one bug to solve. Once the tests pass I will make a PR to main. I figure it will take a little while for people to review, so I will write the tutorial in the meantime

# This is the 1st commit message: Initial changes # This is the commit message #2: Ftarget string -> Target object works!

* Add C++ API for computing type key from type index * Try and isolate leak * Rewrite the bindings to fix the ArgValue lifetime issue There are still quite a few issues left to resolve in this patch, but I believe the runtime changes stablize memory consumption as long as the parameters are only set once. ByteArray also has some totally broken unsafe code which I am unsure of how it was introduced. * Finish handling tvm-rt issues due to ArgValue lifetime This patch further refactors the bindings to better handle the lifetime issues introduced by detecting the argument memory leak. * WIP memory leak * There is issue using TVMCb function which is breaking refcount * Fix fallout from the lifetime refactor * Another tweak * Follow up work from the memory leak, attempt to clean up ByteArray * Add some todos for future work * Fix doc string * Clean up the changes * Format

…ter (apache#8835) * # This is a combination of 2 commits. # This is the 1st commit message: Initial changes # This is the commit message #2: Ftarget string -> Target object works! * Fix remaining target strings * fix bad rebase * Fix typo * 1 more bad rebase fix * Lint * typo * Forgot to commit this * Add TargetStrHash and Map<Target... to std::unordered_map<Target... conversion fn * Passing most tests, yay * remove some comments * lint * target-str-to-target-object * Respond to change requests Co-authored-by: Jared Roesch <[email protected]>

@mdw-octoml

* nll loss v1 * add converter * decode strings in byte form * decode variable length inputs * make shapes correct * unsqueeze * proper weight handling * simplify if statement * fix tests * add comment about tests * delete extra file * lint * so cool * Update CI Lint Image Version (apache#8841) * Update CI Lint Image Version * trigger * [BUG] ToBasicBlockNormalForm immutability (apache#8778) * ToBasicBlockNormalForm immutability * better comment on ToBasicBlock * refine comment of ToBasicBlockForm * [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm (apache#8807) * [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm This new benchmarking function is just a convenience function for calling time_evaluator on the underlying module. Hopefully this should make it easier for users to get good benchmarks of their code. * formatting * import order * more test, more comments, more precision * fix tests * add seconds descriptions to doc * Apply CPPLint to CRT Tests (apache#8844) This one was a bit trickier as there was more usage of dynamic arrays and less safe casts. I've tried to minimise the changes to just those required to passing linting. * [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost. (apache#8584) * [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost. Added initial tunable autotvm templates for depthwise conv2d with NHWC layout for Mali and Bifrost. * [Relay][TOPI] Misc fixes for depthwise conv2d Mali/Bifrost. - Fix assert for Bifrost. - Set reasonable default axis splits to avoid using tophub for NHWC. - Fixed typo: arm cpu -> Mali. * [Relay][TOPI] Fixed formatting in depthwise conv2d Mali/Bifrost. * Support for CMSIS-NN in Corstone300 Makefile (apache#8831) Change-Id: Ifc2305db4e11d1d15d45407287f8f0bea469100a * [microtvm][Zephyr] Increase timeout to fix flaky tests (apache#8846) * increase timeout * trigger * [AMP] Bump up tolerance on flaky test (apache#8850) * bumpy up tol * bumped tolerance up even more * jostle ci * [Hexagon] Rework tvm.target.hexagon() interface (apache#8823) * [Hexagon] Rework tvm.target.hexagon() interface Make the tvm.target.hexagon() function take most options as keyword parameters. This will allow adding additional parameters without changing the interface. No changes are required to existing code, except for changing positional parameters following the CPU version to keyword parameters, and updating the names of the keyword parameters: sim_args -> sim_options, llvm_args -> llvm_options, although the old names will be accepted for the time being. * formatting * change ' to " * Rename 'args' to 'config' for clarity * Use 'strip' instad of 'replace' * Restart build * [Pattern matching] Add an option to rewrite the graph only once (apache#8843) * [Pattern matching] Add an option to rewrite the graph only once If the graph returned from the callback consists of the original pattern, the rewriter will run in the loop, which is not always desired. So this patch proposes an option to run the rewriter only once. Change-Id: I85cf0a055b8961d52394f21c1e4d7aad0a7e1d06 * Make rewrite_once default to false Change-Id: Idf6f01f254c403158883681e75c2a5978efbd2d0 * update gpu and cpu (apache#8853) * VTA cmake change to include Verilator header for building tsim library (apache#8797) * VTA cmake file require Verilator include for tsim target. VTA module.cc uses svOpenArrayHandle to send wide data through DPI * Refactor Verialtor check conditions * Build TSIM only for CPU target. CPU target don't use -Werror to compile with Verilator. Jenkinsfile to have tvm_multilib_tsim defined for CPU build target. * remove build/libvta_tsim.so from non tsim targeting builds * Revert to enable TSIM build i386. Revert to -Werror in CPU config. Remove verilator CPP objects from cmake config for tsim and put them as include into vta module.cc to avoid Verilator compilation warnings * [FIX] Bug fix for a floormod rewrite simplify rule (apache#8852) * Update rewrite_simplify.cc * Update test_arith_rewrite_simplify.py * Update test_arith_rewrite_simplify.py * Update test_arith_rewrite_simplify.py * move rust lint script (apache#8726) * [AMP] Disallow fp16 conversion for summation-like ops (apache#8810) * [AMP] Disallow fp16 conversion for summation-like ops * test only structural equality * [TOPI] [Relay] Sparse Conv2d Implementation for 3x3 kernels (apache#8605) * [topi] add spconv2d_3x3 nhwc * [relay] sparse_conv2d: add kernel_size attr * [relay] add strategy for spconv2d_3x3 nhwc * [relay] pass to convert spconv2d with const args * [relay] convert sparse conv2d pass fixes * use array for sparse conv2d attr * fixup 1x1 tests; new 3x3 tests * extend repeat_interleave op for relay.Expr (apache#8839) Co-authored-by: Valery Chernov <[email protected]> * Change AOT from ExprVisitor to MixedModeVisitor (apache#8856) This should allow better scale-ability for AOT when targeting larger networks. * Add a PaddlePaddle Frontend (apache#8645) * fix some problems for matmul * fix some problems for matmul * add alpha parameter for matmul * remove unnecessary condition * add TranslatedLayer which support model loaded by jit.load * add mul operator support * Add padding mode support for conv/pool2d * support 4 two-tuples * add paddle test case * add paddle conv2d case * update test_forward.py * fix paddle convert_matmul * add paddle multiply and matmul op test case * add test case and fix bug * delete import pandas * add paddlepaddle tests * modify the variable name of convert_reshape * formatting * formatting * use black to format python code * pylint check * Remove fluid api * black format Co-authored-by: root <[email protected]> Co-authored-by: wjj19950828 <[email protected]> Co-authored-by: heliqi <[email protected]> Co-authored-by: Junru Shao <[email protected]> * [Runtime] add set_output_zero_copy (apache#8497) * Update graph_executor.h * Update graph_executor.cc * modify zero copy UT add set input zero copy * modify C style * add runtime test * realy build generatr the json Co-authored-by: hwstaff <[email protected]> * [Hexagon] Change declaration order of unique_ptr objects to fix crash (apache#8859) A crash occurs when automatically deleting an instance of CodeGenHexagon because the LLVMContext object has already been freed. Objects of both types are created using unique_ptr, but the object managed by the LLVMContext unique_ptr is passed to CodeGenHexagon object (not as a unique_ptr). This crash is fixed by moving the declaration of the LLVMContext object before the CodeGenHexagon object. I'm not sure if this is the best way to fix this, but it does fix the crash. Also, in other files, the LLVMContext object is always created first. Co-authored-by: Cahoon, Brendon <[email protected]> * [Graph Executor, VM] Add end to end benchmarking of models (apache#8858) Add benchmarking that includes ovearhead of transfering inputs and outputs to and from the device. This should give an accurate measurement of the runtime a user would see when using the model. This is accomplished by adding functions that run from inputs to return values into the graph executor and the VM. * [UnitTests] Expose TVM pytest helpers as plugin (apache#8532) * [UnitTests] Expose TVM pytest helpers as plugin Previously, pytest helper utilities such as automatic parametrization of `target`/`dev`, or `tvm.testing.parameter` were only available for tests within the `${TVM_HOME}/tests` directory. This PR extracts the helper utilities into an importable plugin, which can be used in external tests (e.g. one-off debugging). * [UnitTests] Refactor the plugin-specific logic out into plugin.py. * [UnitTests] Moved marker definition out to global variable. * Remove AOT Executor header from Arduino project (apache#8857) * [Community] @mdw-octoml -> Reviewer (apache#8868) * [TIR] Fix opaque access in buffer locator pass and match_buffer in region detector (apache#8855) * init * fix * Update src/tir/transforms/plan_update_buffer_allocation_location.cc Co-authored-by: Ruihang Lai <[email protected]> * Update src/tir/transforms/plan_update_buffer_allocation_location.cc Co-authored-by: Ruihang Lai <[email protected]> * address Co-authored-by: Junru Shao <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> * [Autoscheduler] Configurable workload keys (apache#8862) * change workload keys * remove binary string comparison * append the tuple not every integer * clean up * lint * dump workload keys to dags * fix things * change some strings * misc fixes, add tests * jostle ci * [Tutorial][Executor] Fix the usage of executors in tutorials (apache#8586) * fix: executor usage for keras tutorial * fix: executor usage for onnx tutorial * [Tutorial][Executor] Fix executors in tutorials * [Frontend][Onnx] Simplify onnx input since name accesses are not reliable. (apache#8867) * Simplify onnx input since name accesses are no longer supported. * move Celu importer. * [TIR] GetBlockReadWriteRegion (apache#8875) * [TIR] GetBlockReadWriteRegion * Fix black issue * Use constant reference for the interface * Fix lint issue * [RISCV] Add support for llvm parameter -mabi (-target-abi) (apache#8860) * [Community] @manupa-arm -> Committer (apache#8870) * adding Manupa to the contributors list * re-trigger CI * [RPC] Fix ios_rpc build (apache#8864) * [Vulkan][Target] Added the driver name to the vulkan target string. (apache#8882) Driver name (e.g. "NVIDIA", "radv", "AMD open-source driver") is read from the `driverName` property in [VkPhysicalDeviceDriverProperties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkPhysicalDeviceDriverProperties.html), or is left as `"unknown_driver_name"` if the driver does not support querying the driver name. * [ONNX][TOPI] Support select_last_index for argmin/max (apache#8816) * support select_last_index for argmin/max * reverse conditions which made on accident * forward args in reduce.py * make proper nodes for reduction ops * remove complicated nested lambdas * fix lambda capture for conversion * forward more arguments * forward more args * enable onnx tests * wrapping casts to remove ambiguity * revert changes extraneous * correct incorrect attrs being used for ops * change attributes * remove old impl * register new attribute node * clean up test * reformat * reformat * coolio * stable comparison * casts to avoid ambiguity * casting more * correct arg passing * support select_last_index for argmin/max * reverse conditions which made on accident * forward args in reduce.py * make proper nodes for reduction ops * remove complicated nested lambdas * fix lambda capture for conversion * forward more arguments * forward more args * enable onnx tests * wrapping casts to remove ambiguity * revert changes extraneous * correct incorrect attrs being used for ops * change attributes * remove old impl * register new attribute node * clean up test * reformat * reformat * coolio * stable comparison * casts to avoid ambiguity * casting more * correct arg passing * fix broken input * OneElementReduceAttrs-->ArgReduceAttrs" * reduce boilerplate * change names * remove log statement * jostle ci Co-authored-by: Andrew Zhao Luo <[email protected]> * refactor optimize GEMM on CPU tutorial (apache#8825) * refactor optimize GEMM on CPU tutorial * fix lint errors * fix more lint errors * fix typo * fix problem with redefinition of `k` add TODO and comments around loop unrolling clarify note on the array packing figure * reword general description of array packing * grap kaxis from compute definition * remove duplicate comments on unrolling * Change target string to Target object in the TE compiler and interpreter (apache#8835) * # This is a combination of 2 commits. # This is the 1st commit message: Initial changes # This is the commit message #2: Ftarget string -> Target object works! * Fix remaining target strings * fix bad rebase * Fix typo * 1 more bad rebase fix * Lint * typo * Forgot to commit this * Add TargetStrHash and Map<Target... to std::unordered_map<Target... conversion fn * Passing most tests, yay * remove some comments * lint * target-str-to-target-object * Respond to change requests Co-authored-by: Jared Roesch <[email protected]> * [TensorIR][M2a] CacheRead/Write (apache#8863) Co-authored-by: Junru Shao <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Bohan Hou <[email protected]> * [CI] make pre-commit hooks to run on every push instead of every commit (apache#8888) * [TVMScript] Fix printing ForNode annotations (apache#8891) * [1/10] CMSIS-NN graph partitioner for softmax (apache#8653) * cmsis graph partitioner for softmax Change-Id: I80ecd7bc5351f241b4674ef53b36e4398c8adb83 * Updated docstring in the partioning function Change-Id: Ieb4b623e5929cfdb6aa0235db64c825fac8d7055 * [microTVM][RVM] Add Arduino RVM (apache#8748) * Functioning Arduino Vagrant VM Begin building Arduino Vagrant VM Mostly working Vagrant VM Changes for debugging Add ignored json file Fix venv path * Generalize parts of RVM for multiple platforms cwd hack Add unit tests from apps directory to task_python_microtvm.sh Generalize parts of RVM for multiple platforms * Add Vagrantfile lint exceptions * Address PR comments Address Mehrdad's PR comments More PR comments Documentation tweaks Add dialout group to user * Rerun tests * Spresense fix * Rerun CI tests * Rerun tests * sce loss example * add comments, remove other tests * lint * lint * jostle * lint up * jostle * uncomment some tests * proper return * clean up * lint * minor merge errors Co-authored-by: Andrew Zhao Luo <[email protected]> Co-authored-by: Mehrdad Hessar <[email protected]> Co-authored-by: Jiawei Liu <[email protected]> Co-authored-by: Tristan Konolige <[email protected]> Co-authored-by: Christopher Sidebottom <[email protected]> Co-authored-by: Anastasia Stulova <[email protected]> Co-authored-by: Ashutosh Parkhi <[email protected]> Co-authored-by: Krzysztof Parzyszek <[email protected]> Co-authored-by: Elen Kalda <[email protected]> Co-authored-by: Anton Sorokin <[email protected]> Co-authored-by: Chenfan <[email protected]> Co-authored-by: masahi <[email protected]> Co-authored-by: Tantalus13A98B5F <[email protected]> Co-authored-by: Valery Chernov <[email protected]> Co-authored-by: Valery Chernov <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: wjj19950828 <[email protected]> Co-authored-by: heliqi <[email protected]> Co-authored-by: Junru Shao <[email protected]> Co-authored-by: Swift.Sun <[email protected]> Co-authored-by: hwstaff <[email protected]> Co-authored-by: Cahoon, Brendon <[email protected]> Co-authored-by: Lunderberg <[email protected]> Co-authored-by: Yizhi Liu <[email protected]> Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Josh Fromm <[email protected]> Co-authored-by: Alexander Pivovarov <[email protected]> Co-authored-by: Thierry Moreau <[email protected]> Co-authored-by: Egor Churaev <[email protected]> Co-authored-by: Adam Straw <[email protected]> Co-authored-by: Lily Orth-Smith <[email protected]> Co-authored-by: Jared Roesch <[email protected]> Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Michalis Papadimitriou <[email protected]> Co-authored-by: Gavin Uberti <[email protected]>

@mdw-octoml

… only to `/docs` (apache#9031) * Add script to look for changed in doc dir * Modify Jenkinsfile * Minor changes in scripts * Working Jenkinsfile on selective stages on docs * Pass groovy formater on Jenkinsfile * Implementation of relay_to_tir target hook (apache#8423) This the first new hook proposed in the Additional Target Hooks RFC, longer term the compilation should move to using `Target` proper but this unblocks our current work whilst illustrating the eventual interface via `Target` in `src/relay/backend/contrib/example_target_hooks/relay_to_tir.cc` Ideally the host target would be annotated onto the `IRModule` so as this `Pass` could use it instead of defaulting to C but this is fine for now. * [CUDA] Fix dense tensorcore legalize type error when units is specified (apache#9030) * Fix dense tensorcore legalize type error when units is specified * revert black change due to different version from CI * [ONNX] QLinearAveragePool and QLinearGlobalAveragePool contrib op (apache#9017) * [ONNX] QLinearAveragePool and QLinearGlobalAveragePool contrib op * Fix linter error for variable name and else after return * Separate quantized avg_pool impl and add TODO for global_avg_pool * Fix comment typo * Fix line break in `setup.py` (apache#9029) * [Onnx] Add SoftmaxCrossEntropyLoss (apache#8906) * nll loss v1 * add converter * decode strings in byte form * decode variable length inputs * make shapes correct * unsqueeze * proper weight handling * simplify if statement * fix tests * add comment about tests * delete extra file * lint * so cool * Update CI Lint Image Version (apache#8841) * Update CI Lint Image Version * trigger * [BUG] ToBasicBlockNormalForm immutability (apache#8778) * ToBasicBlockNormalForm immutability * better comment on ToBasicBlock * refine comment of ToBasicBlockForm * [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm (apache#8807) * [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm This new benchmarking function is just a convenience function for calling time_evaluator on the underlying module. Hopefully this should make it easier for users to get good benchmarks of their code. * formatting * import order * more test, more comments, more precision * fix tests * add seconds descriptions to doc * Apply CPPLint to CRT Tests (apache#8844) This one was a bit trickier as there was more usage of dynamic arrays and less safe casts. I've tried to minimise the changes to just those required to passing linting. * [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost. (apache#8584) * [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost. Added initial tunable autotvm templates for depthwise conv2d with NHWC layout for Mali and Bifrost. * [Relay][TOPI] Misc fixes for depthwise conv2d Mali/Bifrost. - Fix assert for Bifrost. - Set reasonable default axis splits to avoid using tophub for NHWC. - Fixed typo: arm cpu -> Mali. * [Relay][TOPI] Fixed formatting in depthwise conv2d Mali/Bifrost. * Support for CMSIS-NN in Corstone300 Makefile (apache#8831) Change-Id: Ifc2305db4e11d1d15d45407287f8f0bea469100a * [microtvm][Zephyr] Increase timeout to fix flaky tests (apache#8846) * increase timeout * trigger * [AMP] Bump up tolerance on flaky test (apache#8850) * bumpy up tol * bumped tolerance up even more * jostle ci * [Hexagon] Rework tvm.target.hexagon() interface (apache#8823) * [Hexagon] Rework tvm.target.hexagon() interface Make the tvm.target.hexagon() function take most options as keyword parameters. This will allow adding additional parameters without changing the interface. No changes are required to existing code, except for changing positional parameters following the CPU version to keyword parameters, and updating the names of the keyword parameters: sim_args -> sim_options, llvm_args -> llvm_options, although the old names will be accepted for the time being. * formatting * change ' to " * Rename 'args' to 'config' for clarity * Use 'strip' instad of 'replace' * Restart build * [Pattern matching] Add an option to rewrite the graph only once (apache#8843) * [Pattern matching] Add an option to rewrite the graph only once If the graph returned from the callback consists of the original pattern, the rewriter will run in the loop, which is not always desired. So this patch proposes an option to run the rewriter only once. Change-Id: I85cf0a055b8961d52394f21c1e4d7aad0a7e1d06 * Make rewrite_once default to false Change-Id: Idf6f01f254c403158883681e75c2a5978efbd2d0 * update gpu and cpu (apache#8853) * VTA cmake change to include Verilator header for building tsim library (apache#8797) * VTA cmake file require Verilator include for tsim target. VTA module.cc uses svOpenArrayHandle to send wide data through DPI * Refactor Verialtor check conditions * Build TSIM only for CPU target. CPU target don't use -Werror to compile with Verilator. Jenkinsfile to have tvm_multilib_tsim defined for CPU build target. * remove build/libvta_tsim.so from non tsim targeting builds * Revert to enable TSIM build i386. Revert to -Werror in CPU config. Remove verilator CPP objects from cmake config for tsim and put them as include into vta module.cc to avoid Verilator compilation warnings * [FIX] Bug fix for a floormod rewrite simplify rule (apache#8852) * Update rewrite_simplify.cc * Update test_arith_rewrite_simplify.py * Update test_arith_rewrite_simplify.py * Update test_arith_rewrite_simplify.py * move rust lint script (apache#8726) * [AMP] Disallow fp16 conversion for summation-like ops (apache#8810) * [AMP] Disallow fp16 conversion for summation-like ops * test only structural equality * [TOPI] [Relay] Sparse Conv2d Implementation for 3x3 kernels (apache#8605) * [topi] add spconv2d_3x3 nhwc * [relay] sparse_conv2d: add kernel_size attr * [relay] add strategy for spconv2d_3x3 nhwc * [relay] pass to convert spconv2d with const args * [relay] convert sparse conv2d pass fixes * use array for sparse conv2d attr * fixup 1x1 tests; new 3x3 tests * extend repeat_interleave op for relay.Expr (apache#8839) Co-authored-by: Valery Chernov <[email protected]> * Change AOT from ExprVisitor to MixedModeVisitor (apache#8856) This should allow better scale-ability for AOT when targeting larger networks. * Add a PaddlePaddle Frontend (apache#8645) * fix some problems for matmul * fix some problems for matmul * add alpha parameter for matmul * remove unnecessary condition * add TranslatedLayer which support model loaded by jit.load * add mul operator support * Add padding mode support for conv/pool2d * support 4 two-tuples * add paddle test case * add paddle conv2d case * update test_forward.py * fix paddle convert_matmul * add paddle multiply and matmul op test case * add test case and fix bug * delete import pandas * add paddlepaddle tests * modify the variable name of convert_reshape * formatting * formatting * use black to format python code * pylint check * Remove fluid api * black format Co-authored-by: root <[email protected]> Co-authored-by: wjj19950828 <[email protected]> Co-authored-by: heliqi <[email protected]> Co-authored-by: Junru Shao <[email protected]> * [Runtime] add set_output_zero_copy (apache#8497) * Update graph_executor.h * Update graph_executor.cc * modify zero copy UT add set input zero copy * modify C style * add runtime test * realy build generatr the json Co-authored-by: hwstaff <[email protected]> * [Hexagon] Change declaration order of unique_ptr objects to fix crash (apache#8859) A crash occurs when automatically deleting an instance of CodeGenHexagon because the LLVMContext object has already been freed. Objects of both types are created using unique_ptr, but the object managed by the LLVMContext unique_ptr is passed to CodeGenHexagon object (not as a unique_ptr). This crash is fixed by moving the declaration of the LLVMContext object before the CodeGenHexagon object. I'm not sure if this is the best way to fix this, but it does fix the crash. Also, in other files, the LLVMContext object is always created first. Co-authored-by: Cahoon, Brendon <[email protected]> * [Graph Executor, VM] Add end to end benchmarking of models (apache#8858) Add benchmarking that includes ovearhead of transfering inputs and outputs to and from the device. This should give an accurate measurement of the runtime a user would see when using the model. This is accomplished by adding functions that run from inputs to return values into the graph executor and the VM. * [UnitTests] Expose TVM pytest helpers as plugin (apache#8532) * [UnitTests] Expose TVM pytest helpers as plugin Previously, pytest helper utilities such as automatic parametrization of `target`/`dev`, or `tvm.testing.parameter` were only available for tests within the `${TVM_HOME}/tests` directory. This PR extracts the helper utilities into an importable plugin, which can be used in external tests (e.g. one-off debugging). * [UnitTests] Refactor the plugin-specific logic out into plugin.py. * [UnitTests] Moved marker definition out to global variable. * Remove AOT Executor header from Arduino project (apache#8857) * [Community] @mdw-octoml -> Reviewer (apache#8868) * [TIR] Fix opaque access in buffer locator pass and match_buffer in region detector (apache#8855) * init * fix * Update src/tir/transforms/plan_update_buffer_allocation_location.cc Co-authored-by: Ruihang Lai <[email protected]> * Update src/tir/transforms/plan_update_buffer_allocation_location.cc Co-authored-by: Ruihang Lai <[email protected]> * address Co-authored-by: Junru Shao <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> * [Autoscheduler] Configurable workload keys (apache#8862) * change workload keys * remove binary string comparison * append the tuple not every integer * clean up * lint * dump workload keys to dags * fix things * change some strings * misc fixes, add tests * jostle ci * [Tutorial][Executor] Fix the usage of executors in tutorials (apache#8586) * fix: executor usage for keras tutorial * fix: executor usage for onnx tutorial * [Tutorial][Executor] Fix executors in tutorials * [Frontend][Onnx] Simplify onnx input since name accesses are not reliable. (apache#8867) * Simplify onnx input since name accesses are no longer supported. * move Celu importer. * [TIR] GetBlockReadWriteRegion (apache#8875) * [TIR] GetBlockReadWriteRegion * Fix black issue * Use constant reference for the interface * Fix lint issue * [RISCV] Add support for llvm parameter -mabi (-target-abi) (apache#8860) * [Community] @manupa-arm -> Committer (apache#8870) * adding Manupa to the contributors list * re-trigger CI * [RPC] Fix ios_rpc build (apache#8864) * [Vulkan][Target] Added the driver name to the vulkan target string. (apache#8882) Driver name (e.g. "NVIDIA", "radv", "AMD open-source driver") is read from the `driverName` property in [VkPhysicalDeviceDriverProperties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkPhysicalDeviceDriverProperties.html), or is left as `"unknown_driver_name"` if the driver does not support querying the driver name. * [ONNX][TOPI] Support select_last_index for argmin/max (apache#8816) * support select_last_index for argmin/max * reverse conditions which made on accident * forward args in reduce.py * make proper nodes for reduction ops * remove complicated nested lambdas * fix lambda capture for conversion * forward more arguments * forward more args * enable onnx tests * wrapping casts to remove ambiguity * revert changes extraneous * correct incorrect attrs being used for ops * change attributes * remove old impl * register new attribute node * clean up test * reformat * reformat * coolio * stable comparison * casts to avoid ambiguity * casting more * correct arg passing * support select_last_index for argmin/max * reverse conditions which made on accident * forward args in reduce.py * make proper nodes for reduction ops * remove complicated nested lambdas * fix lambda capture for conversion * forward more arguments * forward more args * enable onnx tests * wrapping casts to remove ambiguity * revert changes extraneous * correct incorrect attrs being used for ops * change attributes * remove old impl * register new attribute node * clean up test * reformat * reformat * coolio * stable comparison * casts to avoid ambiguity * casting more * correct arg passing * fix broken input * OneElementReduceAttrs-->ArgReduceAttrs" * reduce boilerplate * change names * remove log statement * jostle ci Co-authored-by: Andrew Zhao Luo <[email protected]> * refactor optimize GEMM on CPU tutorial (apache#8825) * refactor optimize GEMM on CPU tutorial * fix lint errors * fix more lint errors * fix typo * fix problem with redefinition of `k` add TODO and comments around loop unrolling clarify note on the array packing figure * reword general description of array packing * grap kaxis from compute definition * remove duplicate comments on unrolling * Change target string to Target object in the TE compiler and interpreter (apache#8835) * # This is a combination of 2 commits. # This is the 1st commit message: Initial changes # This is the commit message #2: Ftarget string -> Target object works! * Fix remaining target strings * fix bad rebase * Fix typo * 1 more bad rebase fix * Lint * typo * Forgot to commit this * Add TargetStrHash and Map<Target... to std::unordered_map<Target... conversion fn * Passing most tests, yay * remove some comments * lint * target-str-to-target-object * Respond to change requests Co-authored-by: Jared Roesch <[email protected]> * [TensorIR][M2a] CacheRead/Write (apache#8863) Co-authored-by: Junru Shao <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Bohan Hou <[email protected]> * [CI] make pre-commit hooks to run on every push instead of every commit (apache#8888) * [TVMScript] Fix printing ForNode annotations (apache#8891) * [1/10] CMSIS-NN graph partitioner for softmax (apache#8653) * cmsis graph partitioner for softmax Change-Id: I80ecd7bc5351f241b4674ef53b36e4398c8adb83 * Updated docstring in the partioning function Change-Id: Ieb4b623e5929cfdb6aa0235db64c825fac8d7055 * [microTVM][RVM] Add Arduino RVM (apache#8748) * Functioning Arduino Vagrant VM Begin building Arduino Vagrant VM Mostly working Vagrant VM Changes for debugging Add ignored json file Fix venv path * Generalize parts of RVM for multiple platforms cwd hack Add unit tests from apps directory to task_python_microtvm.sh Generalize parts of RVM for multiple platforms * Add Vagrantfile lint exceptions * Address PR comments Address Mehrdad's PR comments More PR comments Documentation tweaks Add dialout group to user * Rerun tests * Spresense fix * Rerun CI tests * Rerun tests * sce loss example * add comments, remove other tests * lint * lint * jostle * lint up * jostle * uncomment some tests * proper return * clean up * lint * minor merge errors Co-authored-by: Andrew Zhao Luo <[email protected]> Co-authored-by: Mehrdad Hessar <[email protected]> Co-authored-by: Jiawei Liu <[email protected]> Co-authored-by: Tristan Konolige <[email protected]> Co-authored-by: Christopher Sidebottom <[email protected]> Co-authored-by: Anastasia Stulova <[email protected]> Co-authored-by: Ashutosh Parkhi <[email protected]> Co-authored-by: Krzysztof Parzyszek <[email protected]> Co-authored-by: Elen Kalda <[email protected]> Co-authored-by: Anton Sorokin <[email protected]> Co-authored-by: Chenfan <[email protected]> Co-authored-by: masahi <[email protected]> Co-authored-by: Tantalus13A98B5F <[email protected]> Co-authored-by: Valery Chernov <[email protected]> Co-authored-by: Valery Chernov <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: wjj19950828 <[email protected]> Co-authored-by: heliqi <[email protected]> Co-authored-by: Junru Shao <[email protected]> Co-authored-by: Swift.Sun <[email protected]> Co-authored-by: hwstaff <[email protected]> Co-authored-by: Cahoon, Brendon <[email protected]> Co-authored-by: Lunderberg <[email protected]> Co-authored-by: Yizhi Liu <[email protected]> Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Josh Fromm <[email protected]> Co-authored-by: Alexander Pivovarov <[email protected]> Co-authored-by: Thierry Moreau <[email protected]> Co-authored-by: Egor Churaev <[email protected]> Co-authored-by: Adam Straw <[email protected]> Co-authored-by: Lily Orth-Smith <[email protected]> Co-authored-by: Jared Roesch <[email protected]> Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Michalis Papadimitriou <[email protected]> Co-authored-by: Gavin Uberti <[email protected]> * [Hexagon] Don't use {} initialization with FastRPC structures (apache#9033) The data members in FastRPC structures aren't guaranteed to remain in the same order. Replace aggregate initialization with direct, member-by-member initialization. * Test * Minor checkstyle issue * Test * Test file * Revert changed in unit tests * Change script name * Test * Revert format on groovy file * Remove test file * Minor change in script * Minor formating changes * Revert logic in conditions for changed files Co-authored-by: Christopher Sidebottom <[email protected]> Co-authored-by: masahi <[email protected]> Co-authored-by: Anirudh Sundar <[email protected]> Co-authored-by: Leandro Nunes <[email protected]> Co-authored-by: AndrewZhaoLuo <[email protected]> Co-authored-by: Andrew Zhao Luo <[email protected]> Co-authored-by: Mehrdad Hessar <[email protected]> Co-authored-by: Jiawei Liu <[email protected]> Co-authored-by: Tristan Konolige <[email protected]> Co-authored-by: Christopher Sidebottom <[email protected]> Co-authored-by: Anastasia Stulova <[email protected]> Co-authored-by: Ashutosh Parkhi <[email protected]> Co-authored-by: Krzysztof Parzyszek <[email protected]> Co-authored-by: Elen Kalda <[email protected]> Co-authored-by: Anton Sorokin <[email protected]> Co-authored-by: Chenfan <[email protected]> Co-authored-by: Tantalus13A98B5F <[email protected]> Co-authored-by: Valery Chernov <[email protected]> Co-authored-by: Valery Chernov <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: wjj19950828 <[email protected]> Co-authored-by: heliqi <[email protected]> Co-authored-by: Junru Shao <[email protected]> Co-authored-by: Swift.Sun <[email protected]> Co-authored-by: hwstaff <[email protected]> Co-authored-by: Cahoon, Brendon <[email protected]> Co-authored-by: Lunderberg <[email protected]> Co-authored-by: Yizhi Liu <[email protected]> Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Josh Fromm <[email protected]> Co-authored-by: Alexander Pivovarov <[email protected]> Co-authored-by: Thierry Moreau <[email protected]> Co-authored-by: Egor Churaev <[email protected]> Co-authored-by: Adam Straw <[email protected]> Co-authored-by: Lily Orth-Smith <[email protected]> Co-authored-by: Jared Roesch <[email protected]> Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Gavin Uberti <[email protected]>

Lily Orth-Smith and others added 30 commits September 18, 2020 12:56

hacky version of quantize pass implemented

58813dc

quantize / dequantize instead of casting

40f988d

more updates

24b7fa1

Add preprocessing.

6f1df09

start of requantize pass

cd7121f

requantize pass runs

8d116cd

requantize after conv2d in most cases

24dea2e

some notes to self

2acc2b4

testing maxpool no option

f4615dc

added skipping layers and quantization of dense operator

94ea3aa

added variables for scale, zp, and skip_layers

01269b2

some stuff

dfdea03

first attempt at calibration_map for conv2d

16a1499

remove node_map

bc885b8

attempt at global calibration pass with Let

84dbc3e

global calibration runs!

514fec5

remove prints

c726aed

test script compares global calibration to unquantized

d0e7f23

calibration pass in progress, pad bug

2aba530

subgraphs run!

f172f01

separate quantization for weights and activation

2a5dcdd

refactor calibration_map

1f6dac3

merge with main

2487817

change calibration_callback

fe23ae8

cleaning up code

0841b14

add is_weight util

cb2b412

unit testing

7c067ec

Merge remote-tracking branch 'upstream/main' into quantization

230275c

add more testing for quantization

460f5b8

debugging bind_params_by_name

afd2690

electriclilies and others added 8 commits February 2, 2021 17:46

merged with main, allow poverlap in pattern groups

cd848bc

fix some lint

236a2ba

Update test_op_qnn_dequantize.py

4c48417

lint and docstrings

bfcab0a

Merge branch 'quantization' of github.com:electriclilies/incubator-tv…

7f9b9e9

…m into quantization

adding docs and lint

e90d777

run black and clang, add docs

102bf73

lint + more doc strings

96e6879

mbrookhart reviewed Feb 4, 2021

View reviewed changes

electriclilies added 6 commits February 4, 2021 16:49

fix spelling of calibrator

4f73067

cleaning up code

14d9618

black

a738cf5

finish quantize and calibrate tests, more docs

fa72484

problem with average max quantize??

8411824

move dataset manager and add DenseBiasAddPattern

3f05c65

electriclilies marked this pull request as draft February 11, 2021 22:47

electriclilies added 4 commits February 11, 2021 14:54

fix imports

d7e9fc2

fix bias add

08e1c67

tutorials

2496eb1

requantizer tests

f52a81f

mbrookhart reviewed Feb 17, 2021

View reviewed changes

electriclilies added 2 commits February 17, 2021 14:26

more tests, fixing bugs in tests

62a43b5

test relay pass:

e7a0c8d

electriclilies pushed a commit that referenced this pull request Aug 24, 2021

# This is a combination of 2 commits.

c6447b6

# This is the 1st commit message: Initial changes # This is the commit message #2: Ftarget string -> Target object works!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization #2

Quantization #2

electriclilies commented Feb 2, 2021 •

edited

Loading

mbrookhart left a comment

mbrookhart Feb 4, 2021

mbrookhart Feb 4, 2021

mbrookhart Feb 4, 2021

electriclilies commented Feb 4, 2021

mbrookhart left a comment

mbrookhart Feb 17, 2021

mbrookhart Feb 17, 2021

electriclilies commented Feb 17, 2021 •

edited

Loading

		CHECK(out_dtype == DataType::Float(32) \|\| out_dtype == DataType::Int(32))
		<< "out_dtype for dequantize must be float32 or int32, but got " << out_dtype;

Quantization #2

Are you sure you want to change the base?

Quantization #2

Conversation

electriclilies commented Feb 2, 2021 • edited Loading

mbrookhart left a comment

Choose a reason for hiding this comment

mbrookhart Feb 4, 2021

Choose a reason for hiding this comment

mbrookhart Feb 4, 2021

Choose a reason for hiding this comment

mbrookhart Feb 4, 2021

Choose a reason for hiding this comment

electriclilies commented Feb 4, 2021

mbrookhart left a comment

Choose a reason for hiding this comment

mbrookhart Feb 17, 2021

Choose a reason for hiding this comment

mbrookhart Feb 17, 2021

Choose a reason for hiding this comment

electriclilies commented Feb 17, 2021 • edited Loading

electriclilies commented Feb 2, 2021 •

edited

Loading

electriclilies commented Feb 17, 2021 •

edited

Loading