deploy: 9cd5084

baniasbaabe · Apr 28, 2024 · 427a6d6 · 427a6d6
1 parent 9c0bdf0
commit 427a6d6
Show file tree

Hide file tree

Showing 7 changed files with 151 additions and 27 deletions.
diff --git a/_sources/book/llm/Chapter.ipynb b/_sources/book/llm/Chapter.ipynb
@@ -244,6 +244,48 @@
     "\n",
     "print(json.dumps(results, indent=3))"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Embed Any Type of File"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "These days, everything is about Embeddings and LLMs.\n",
+    "\n",
+    "The Python library `embed-anything` makes it easy to generate embeddings from multiple sources like image, video, or audio.\n",
+    "\n",
+    "It's built in Rust so it executes fast."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install embed-anything"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import embed_anything\n",
+    "\n",
+    "data = embed_anything.embed_file(\"filename.pdf\", embeder= \"Bert\")\n",
+    "embeddings = np.array([data.embedding for data in data])\n",
+    "\n",
+    "data = embed_anything.embed_directory(\"test_files\", embeder= \"Clip\")\n",
+    "embeddings = np.array([data.embedding for data in data])"
+   ]
   }
  ],
  "metadata": {

diff --git a/_sources/book/machinelearning/featureselection.ipynb b/_sources/book/machinelearning/featureselection.ipynb
@@ -247,20 +247,18 @@
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
+   "cell_type": "markdown",
    "metadata": {},
-   "outputs": [],
    "source": [
     "Do you want to do Feature Selection automatically?\n",
     "\n",
-    "Try mrmr.\n",
+    "Try `mrmr`.\n",
     "\n",
-    "mrmr (minimum-Redundancy-Maximum-Relevance) is a minimal-optimal feature selection algorithm at scale.\n",
+    "`mrmr` (minimum-Redundancy-Maximum-Relevance) is a minimal-optimal feature selection algorithm at scale.\n",
     "\n",
-    "It means mrmr will find the smallest relevant subset of features your ML Model needs.\n",
+    "It means `mrmr` will find the smallest relevant subset of features your ML Model needs.\n",
     "\n",
-    "mrmr supports common tools like Pandas, Polars and Spark.\n",
+    "`mrmr` supports common tools like Pandas, Polars and Spark.\n",
     "\n",
     "See below how we want to select the best K features.\n",
     "\n",

diff --git a/_sources/book/polars/Chapter.ipynb b/_sources/book/polars/Chapter.ipynb
@@ -64,6 +64,47 @@
     "    pl.col(\"actual\").num_ext.binary_metrics_combo(pl.col(\"predicted\")).alias(\"combo\")\n",
     ").unnest(\"combo\")"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Plugin for Fitting Linear Models"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In Polars, you can fit linear models with the `polars-ols` extension.\n",
+    "\n",
+    "You can use ordinary, weighted or regularized least squares like Lasso or Elastic Net.\n",
+    "\n",
+    "It can be 2x-88x times faster than popular libraries like sklearn or statsmodels."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install polars-ols"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import polars as pl\n",
+    "import polars_ols as pls\n",
+    "\n",
+    "lasso_expr = pl.col(\"y\").least_squares.lasso(\"x1\", \"x2\", alpha=0.0001, add_intercept=True).over(\"group\")\n",
+    "\n",
+    "predictions = df.with_columns(lasso_expr.round(2).alias(\"predictions_lasso\"))"
+   ]
   }
  ],
  "metadata": {

diff --git a/book/llm/Chapter.html b/book/llm/Chapter.html
@@ -425,6 +425,7 @@ <h2> Contents </h2>
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#one-function-call-to-any-llm-with-litellm">6.1.2. One-Function Call to Any LLM with <code class="docutils literal notranslate"><span class="pre">litellm</span></code></a></li>
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#safeguard-your-llms-with-llmguard">6.1.3. Safeguard Your LLMs with <code class="docutils literal notranslate"><span class="pre">LLMGuard</span></code></a></li>
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#evaluate-llms-with-uptrain">6.1.4. Evaluate LLMs with <code class="docutils literal notranslate"><span class="pre">uptrain</span></code></a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#embed-any-type-of-file">6.1.5. Embed Any Type of File</a></li>
 </ul>
             </nav>
         </div>
@@ -605,6 +606,32 @@ <h2><span class="section-number">6.1.4. </span>Evaluate LLMs with <code class="d
 </div>
 </div>
 </section>
+<section id="embed-any-type-of-file">
+<h2><span class="section-number">6.1.5. </span>Embed Any Type of File<a class="headerlink" href="#embed-any-type-of-file" title="Permalink to this heading">#</a></h2>
+<p>These days, everything is about Embeddings and LLMs.</p>
+<p>The Python library <code class="docutils literal notranslate"><span class="pre">embed-anything</span></code> makes it easy to generate embeddings from multiple sources like image, video, or audio.</p>
+<p>It’s built in Rust so it executes fast.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span>!pip install embed-anything
+</pre></div>
+</div>
+</div>
+</div>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">embed_anything</span>
+
+<span class="n">data</span> <span class="o">=</span> <span class="n">embed_anything</span><span class="o">.</span><span class="n">embed_file</span><span class="p">(</span><span class="s2">&quot;filename.pdf&quot;</span><span class="p">,</span> <span class="n">embeder</span><span class="o">=</span> <span class="s2">&quot;Bert&quot;</span><span class="p">)</span>
+<span class="n">embeddings</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="n">data</span><span class="o">.</span><span class="n">embedding</span> <span class="k">for</span> <span class="n">data</span> <span class="ow">in</span> <span class="n">data</span><span class="p">])</span>
+
+<span class="n">data</span> <span class="o">=</span> <span class="n">embed_anything</span><span class="o">.</span><span class="n">embed_directory</span><span class="p">(</span><span class="s2">&quot;test_files&quot;</span><span class="p">,</span> <span class="n">embeder</span><span class="o">=</span> <span class="s2">&quot;Clip&quot;</span><span class="p">)</span>
+<span class="n">embeddings</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="n">data</span><span class="o">.</span><span class="n">embedding</span> <span class="k">for</span> <span class="n">data</span> <span class="ow">in</span> <span class="n">data</span><span class="p">])</span>
+</pre></div>
+</div>
+</div>
+</div>
+</section>
 </section>
 
     <script type="text/x-thebe-config">
@@ -678,6 +705,7 @@ <h2><span class="section-number">6.1.4. </span>Evaluate LLMs with <code class="d
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#one-function-call-to-any-llm-with-litellm">6.1.2. One-Function Call to Any LLM with <code class="docutils literal notranslate"><span class="pre">litellm</span></code></a></li>
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#safeguard-your-llms-with-llmguard">6.1.3. Safeguard Your LLMs with <code class="docutils literal notranslate"><span class="pre">LLMGuard</span></code></a></li>
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#evaluate-llms-with-uptrain">6.1.4. Evaluate LLMs with <code class="docutils literal notranslate"><span class="pre">uptrain</span></code></a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#embed-any-type-of-file">6.1.5. Embed Any Type of File</a></li>
 </ul>
   </nav></div>
 

diff --git a/book/machinelearning/featureselection.html b/book/machinelearning/featureselection.html
@@ -586,25 +586,13 @@ <h2><span class="section-number">5.3.4. </span>Find the Most Predictive Variable
 </section>
 <section id="feature-selection-at-scale-with-mrmr">
 <h2><span class="section-number">5.3.5. </span>Feature Selection at Scale with <code class="docutils literal notranslate"><span class="pre">mrmr</span></code><a class="headerlink" href="#feature-selection-at-scale-with-mrmr" title="Permalink to this heading">#</a></h2>
-<div class="cell docutils container">
-<div class="cell_input docutils container">
-<div class="highlight-python notranslate"><div class="highlight"><pre><span></span>Do you want to do Feature Selection automatically?
-
-Try mrmr.
-
-mrmr (minimum-Redundancy-Maximum-Relevance) is a minimal-optimal feature selection algorithm at scale.
-
-It means mrmr will find the smallest relevant subset of features your ML Model needs.
-
-mrmr supports common tools like Pandas, Polars and Spark.
-
-See below how we want to select the best K features.
-
-The output is a ranked list of the relevant features.
-</pre></div>
-</div>
-</div>
-</div>
+<p>Do you want to do Feature Selection automatically?</p>
+<p>Try <code class="docutils literal notranslate"><span class="pre">mrmr</span></code>.</p>
+<p><code class="docutils literal notranslate"><span class="pre">mrmr</span></code> (minimum-Redundancy-Maximum-Relevance) is a minimal-optimal feature selection algorithm at scale.</p>
+<p>It means <code class="docutils literal notranslate"><span class="pre">mrmr</span></code> will find the smallest relevant subset of features your ML Model needs.</p>
+<p><code class="docutils literal notranslate"><span class="pre">mrmr</span></code> supports common tools like Pandas, Polars and Spark.</p>
+<p>See below how we want to select the best K features.</p>
+<p>The output is a ranked list of the relevant features.</p>
 <div class="cell docutils container">
 <div class="cell_input docutils container">
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span>!pip install mrmr_selection

diff --git a/book/polars/Chapter.html b/book/polars/Chapter.html
@@ -422,6 +422,7 @@ <h2> Contents </h2>
             <nav aria-label="Page">
                 <ul class="visible nav section-nav flex-column">
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#plugin-for-data-science-functions">9.1.1. Plugin for Data Science Functions</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#plugin-for-fitting-linear-models">9.1.2. Plugin for Fitting Linear Models</a></li>
 </ul>
             </nav>
         </div>
@@ -470,6 +471,31 @@ <h2><span class="section-number">9.1.1. </span>Plugin for Data Science Functions
 </div>
 </div>
 </section>
+<section id="plugin-for-fitting-linear-models">
+<h2><span class="section-number">9.1.2. </span>Plugin for Fitting Linear Models<a class="headerlink" href="#plugin-for-fitting-linear-models" title="Permalink to this heading">#</a></h2>
+<p>In Polars, you can fit linear models with the <code class="docutils literal notranslate"><span class="pre">polars-ols</span></code> extension.</p>
+<p>You can use ordinary, weighted or regularized least squares like Lasso or Elastic Net.</p>
+<p>It can be 2x-88x times faster than popular libraries like sklearn or statsmodels.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span>!pip install polars-ols
+</pre></div>
+</div>
+</div>
+</div>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">polars</span> <span class="k">as</span> <span class="nn">pl</span>
+<span class="kn">import</span> <span class="nn">polars_ols</span> <span class="k">as</span> <span class="nn">pls</span>
+
+<span class="n">lasso_expr</span> <span class="o">=</span> <span class="n">pl</span><span class="o">.</span><span class="n">col</span><span class="p">(</span><span class="s2">&quot;y&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">least_squares</span><span class="o">.</span><span class="n">lasso</span><span class="p">(</span><span class="s2">&quot;x1&quot;</span><span class="p">,</span> <span class="s2">&quot;x2&quot;</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.0001</span><span class="p">,</span> <span class="n">add_intercept</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span><span class="o">.</span><span class="n">over</span><span class="p">(</span><span class="s2">&quot;group&quot;</span><span class="p">)</span>
+
+<span class="n">predictions</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">with_columns</span><span class="p">(</span><span class="n">lasso_expr</span><span class="o">.</span><span class="n">round</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span><span class="o">.</span><span class="n">alias</span><span class="p">(</span><span class="s2">&quot;predictions_lasso&quot;</span><span class="p">))</span>
+</pre></div>
+</div>
+</div>
+</div>
+</section>
 </section>
 
     <script type="text/x-thebe-config">
@@ -540,6 +566,7 @@ <h2><span class="section-number">9.1.1. </span>Plugin for Data Science Functions
   <nav class="bd-toc-nav page-toc">
     <ul class="visible nav section-nav flex-column">
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#plugin-for-data-science-functions">9.1.1. Plugin for Data Science Functions</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#plugin-for-fitting-linear-models">9.1.2. Plugin for Fitting Linear Models</a></li>
 </ul>
   </nav></div>
 

diff --git a/searchindex.js b/searchindex.js