Push dev branch build

harvard-edge · Aug 22, 2024 · ea27b72 · ea27b72
1 parent 066bd7f
commit ea27b72
Show file tree

Hide file tree

Showing 5 changed files with 9 additions and 8 deletions.
diff --git a/docs/contents/contributors.html b/docs/contents/contributors.html
@@ -848,7 +848,7 @@ <h1 class="title">Contributors</h1>
 <a href="https://github.com/Allen-Kuang"><img src="https://avatars.githubusercontent.com/Allen-Kuang?s=100" width="100px;" alt="Allen-Kuang"><br><sub><b>Allen-Kuang</b></sub></a><br>
 </td>
 <td align="center" valign="top" width="20%">
-<a href="https://github.com/alex-oesterling"><img src="https://avatars.githubusercontent.com/alex-oesterling?s=100" width="100px;" alt="alex-oesterling"><br><sub><b>alex-oesterling</b></sub></a><br>
+<a href="https://github.com/alex-oesterling"><img src="https://avatars.githubusercontent.com/alex-oesterling?s=100" width="100px;" alt="Alex Oesterling"><br><sub><b>Alex Oesterling</b></sub></a><br>
 </td>
 <td align="center" valign="top" width="20%">
 <a href="https://github.com/Gjain234"><img src="https://avatars.githubusercontent.com/Gjain234?s=100" width="100px;" alt="Gauri Jain"><br><sub><b>Gauri Jain</b></sub></a><br>

diff --git a/docs/contents/optimizations/optimizations.html b/docs/contents/optimizations/optimizations.html
@@ -2129,7 +2129,8 @@ <h3 data-number="9.5.3" class="anchored" data-anchor-id="hardware-optimization-l
 <p>Hardware libraries like TensorRT and TensorFlow XLA allow models to be highly optimized for target hardware through techniques that we discussed earlier.</p>
 <p>Quantization: For example, TensorRT and TensorFlow Lite both support quantization of models during conversion to their format. This provides speedups on mobile SoCs with INT8/INT4 support.</p>
 <p>Kernel Optimization: For instance, TensorRT does auto-tuning to optimize CUDA kernels based on the GPU architecture for each layer in the model graph. This extracts maximum throughput.</p>
-<p>Operator Fusion: TensorFlow XLA does aggressive fusion to create optimized binary for TPUs. On mobile, frameworks like NCNN also support fused operators. ` Hardware-Specific Code: Libraries are used to generate optimized binary code specialized for the target hardware. For example, <a href="https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html">TensorRT</a> uses Nvidia CUDA/cuDNN libraries which are hand-tuned for each GPU architecture. This hardware-specific coding is key for performance. On TinyML devices, this can mean assembly code optimized for a Cortex M4 CPU for example. Vendors provide CMSIS-NN and other libraries.</p>
+<p>Operator Fusion: TensorFlow XLA does aggressive fusion to create optimized binary for TPUs. On mobile, frameworks like NCNN also support fused operators.</p>
+<p>Hardware-Specific Code: Libraries are used to generate optimized binary code specialized for the target hardware. For example, <a href="https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html">TensorRT</a> uses Nvidia CUDA/cuDNN libraries which are hand-tuned for each GPU architecture. This hardware-specific coding is key for performance. On TinyML devices, this can mean assembly code optimized for a Cortex M4 CPU for example. Vendors provide CMSIS-NN and other libraries.</p>
 <p>Data Layout Optimizations - We can efficiently leverage memory hierarchy of hardware like cache and registers through techniques like tensor/weight rearrangement, tiling, and reuse. For example, TensorFlow XLA optimizes buffer layouts to maximize TPU utilization. This helps any memory constrained systems.</p>
 <p>Profiling-based Tuning - We can use profiling tools to identify bottlenecks. For example, adjust kernel fusion levels based on latency profiling. On mobile SoCs, vendors like Qualcomm provide profilers in SNPE to find optimization opportunities in CNNs. This data-driven approach is important for performance.</p>
 <p>By integrating framework models with these hardware libraries through conversion and execution pipelines, ML developers can achieve significant speedups and efficiency gains from low-level optimizations tailored to the target hardware. The tight integration between software and hardware is key to enabling performant deployment of ML applications, especially on mobile and TinyML devices.</p>

diff --git a/docs/index.html b/docs/index.html
@@ -7,7 +7,7 @@
 <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
 
 <meta name="author" content="Vijay Janapa Reddi">
-<meta name="dcterms.date" content="2024-08-21">
+<meta name="dcterms.date" content="2024-08-22">
 
 <title>Machine Learning Systems</title>
 <style>
@@ -648,7 +648,7 @@ <h1 class="title">Machine Learning Systems</h1>
     <div>
     <div class="quarto-title-meta-heading">Last Updated</div>
     <div class="quarto-title-meta-contents">
-      <p class="date">August 21, 2024</p>
+      <p class="date">August 22, 2024</p>
     </div>
   </div>
 

diff --git a/docs/references.html b/docs/references.html
@@ -1350,7 +1350,7 @@ <h1 class="title">References</h1>
 in Theoretical Computer Science</em> 9 (3-4): 211–407. <a href="https://doi.org/10.1561/0400000042">https://doi.org/10.1561/0400000042</a>.
 </div>
 <div id="ref-ebrahimi2014review" class="csl-entry" role="listitem">
-Ebrahimi, Khosrow, Gerard F. Jones, and Amy S. Fleischer. 2014. <span>“A
+Ebrahimi, Khosrow, Gerard F. Jones, and Amy S. Fleischer. 2014. <span>��A
 Review of Data Center Cooling Technology, Operating Conditions and the
 Corresponding Low-Grade Waste Heat Recovery Opportunities.”</span>
 <em>Renewable Sustainable Energy Rev.</em> 31 (March): 622–38. <a href="https://doi.org/10.1016/j.rser.2013.12.007">https://doi.org/10.1016/j.rser.2013.12.007</a>.