Skip to content

Commit

Permalink
Push dev branch build
Browse files Browse the repository at this point in the history
  • Loading branch information
Naeemkh committed Aug 22, 2024
1 parent 066bd7f commit ea27b72
Show file tree
Hide file tree
Showing 5 changed files with 9 additions and 8 deletions.
2 changes: 1 addition & 1 deletion docs/contents/contributors.html
Original file line number Diff line number Diff line change
Expand Up @@ -848,7 +848,7 @@ <h1 class="title">Contributors</h1>
<a href="https://github.com/Allen-Kuang"><img src="https://avatars.githubusercontent.com/Allen-Kuang?s=100" width="100px;" alt="Allen-Kuang"><br><sub><b>Allen-Kuang</b></sub></a><br>
</td>
<td align="center" valign="top" width="20%">
<a href="https://github.com/alex-oesterling"><img src="https://avatars.githubusercontent.com/alex-oesterling?s=100" width="100px;" alt="alex-oesterling"><br><sub><b>alex-oesterling</b></sub></a><br>
<a href="https://github.com/alex-oesterling"><img src="https://avatars.githubusercontent.com/alex-oesterling?s=100" width="100px;" alt="Alex Oesterling"><br><sub><b>Alex Oesterling</b></sub></a><br>
</td>
<td align="center" valign="top" width="20%">
<a href="https://github.com/Gjain234"><img src="https://avatars.githubusercontent.com/Gjain234?s=100" width="100px;" alt="Gauri Jain"><br><sub><b>Gauri Jain</b></sub></a><br>
Expand Down
3 changes: 2 additions & 1 deletion docs/contents/optimizations/optimizations.html
Original file line number Diff line number Diff line change
Expand Up @@ -2129,7 +2129,8 @@ <h3 data-number="9.5.3" class="anchored" data-anchor-id="hardware-optimization-l
<p>Hardware libraries like TensorRT and TensorFlow XLA allow models to be highly optimized for target hardware through techniques that we discussed earlier.</p>
<p>Quantization: For example, TensorRT and TensorFlow Lite both support quantization of models during conversion to their format. This provides speedups on mobile SoCs with INT8/INT4 support.</p>
<p>Kernel Optimization: For instance, TensorRT does auto-tuning to optimize CUDA kernels based on the GPU architecture for each layer in the model graph. This extracts maximum throughput.</p>
<p>Operator Fusion: TensorFlow XLA does aggressive fusion to create optimized binary for TPUs. On mobile, frameworks like NCNN also support fused operators. ` Hardware-Specific Code: Libraries are used to generate optimized binary code specialized for the target hardware. For example, <a href="https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html">TensorRT</a> uses Nvidia CUDA/cuDNN libraries which are hand-tuned for each GPU architecture. This hardware-specific coding is key for performance. On TinyML devices, this can mean assembly code optimized for a Cortex M4 CPU for example. Vendors provide CMSIS-NN and other libraries.</p>
<p>Operator Fusion: TensorFlow XLA does aggressive fusion to create optimized binary for TPUs. On mobile, frameworks like NCNN also support fused operators.</p>
<p>Hardware-Specific Code: Libraries are used to generate optimized binary code specialized for the target hardware. For example, <a href="https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html">TensorRT</a> uses Nvidia CUDA/cuDNN libraries which are hand-tuned for each GPU architecture. This hardware-specific coding is key for performance. On TinyML devices, this can mean assembly code optimized for a Cortex M4 CPU for example. Vendors provide CMSIS-NN and other libraries.</p>
<p>Data Layout Optimizations - We can efficiently leverage memory hierarchy of hardware like cache and registers through techniques like tensor/weight rearrangement, tiling, and reuse. For example, TensorFlow XLA optimizes buffer layouts to maximize TPU utilization. This helps any memory constrained systems.</p>
<p>Profiling-based Tuning - We can use profiling tools to identify bottlenecks. For example, adjust kernel fusion levels based on latency profiling. On mobile SoCs, vendors like Qualcomm provide profilers in SNPE to find optimization opportunities in CNNs. This data-driven approach is important for performance.</p>
<p>By integrating framework models with these hardware libraries through conversion and execution pipelines, ML developers can achieve significant speedups and efficiency gains from low-level optimizations tailored to the target hardware. The tight integration between software and hardware is key to enabling performant deployment of ML applications, especially on mobile and TinyML devices.</p>
Expand Down
4 changes: 2 additions & 2 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">

<meta name="author" content="Vijay Janapa Reddi">
<meta name="dcterms.date" content="2024-08-21">
<meta name="dcterms.date" content="2024-08-22">

<title>Machine Learning Systems</title>
<style>
Expand Down Expand Up @@ -648,7 +648,7 @@ <h1 class="title">Machine Learning Systems</h1>
<div>
<div class="quarto-title-meta-heading">Last Updated</div>
<div class="quarto-title-meta-contents">
<p class="date">August 21, 2024</p>
<p class="date">August 22, 2024</p>
</div>
</div>

Expand Down
2 changes: 1 addition & 1 deletion docs/references.html
Original file line number Diff line number Diff line change
Expand Up @@ -1350,7 +1350,7 @@ <h1 class="title">References</h1>
in Theoretical Computer Science</em> 9 (3-4): 211–407. <a href="https://doi.org/10.1561/0400000042">https://doi.org/10.1561/0400000042</a>.
</div>
<div id="ref-ebrahimi2014review" class="csl-entry" role="listitem">
Ebrahimi, Khosrow, Gerard F. Jones, and Amy S. Fleischer. 2014. <span>A
Ebrahimi, Khosrow, Gerard F. Jones, and Amy S. Fleischer. 2014. <span>��A
Review of Data Center Cooling Technology, Operating Conditions and the
Corresponding Low-Grade Waste Heat Recovery Opportunities.”</span>
<em>Renewable Sustainable Energy Rev.</em> 31 (March): 622–38. <a href="https://doi.org/10.1016/j.rser.2013.12.007">https://doi.org/10.1016/j.rser.2013.12.007</a>.
Expand Down
Loading

0 comments on commit ea27b72

Please sign in to comment.