Skip to content

Commit

Permalink
Generated by gradle-git-publish.
Browse files Browse the repository at this point in the history
  • Loading branch information
runner committed Jan 9, 2025
1 parent 63dfcaf commit a290a24
Show file tree
Hide file tree
Showing 4 changed files with 63 additions and 70 deletions.
51 changes: 22 additions & 29 deletions blog/2024/12/20/linear-regression-notes/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -1135,31 +1135,26 @@ <h2 id="polynomial-transformation">Polynomial transformation</h2>
good relationship between them, specially with a simple line. The <strong>polynomial transformation</strong> helps finding those
relationships. Applying a polynomial transformation to our problem can help the linear regression to adapt better to the
shape of the data. This is the same linear regression example, but this time applying the <strong>polynomialFeatures</strong> function prior
to the linear regression fit. </p>
<p>Notice that before applying all new features to the algorithm we are normalizing all of them with
the <strong>minMaxScaler</strong> transformation.</p>
to the linear regression fit.</p>
<div class="language-groovy highlight"><span class="filename">applying polynomial transformation</span><pre><span></span><code><span id="__span-10-1"><a id="__codelineno-10-1" name="__codelineno-10-1" href="#__codelineno-10-1"></a><span class="c1">// transforming X adding new generated features</span>
</span><span id="__span-10-2"><a id="__codelineno-10-2" name="__codelineno-10-2" href="#__codelineno-10-2"></a><span class="kt">def</span><span class="w"> </span><span class="n">xPoly</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ml</span><span class="o">.</span><span class="na">features</span><span class="o">.</span><span class="na">polynomialFeatures</span><span class="o">(</span><span class="n">X</span><span class="o">)</span>
</span><span id="__span-10-3"><a id="__codelineno-10-3" name="__codelineno-10-3" href="#__codelineno-10-3"></a>
</span><span id="__span-10-4"><a id="__codelineno-10-4" name="__codelineno-10-4" href="#__codelineno-10-4"></a><span class="c1">// normalizing all features</span>
</span><span id="__span-10-5"><a id="__codelineno-10-5" name="__codelineno-10-5" href="#__codelineno-10-5"></a><span class="kt">def</span><span class="w"> </span><span class="n">xNormalized</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ml</span><span class="o">.</span><span class="na">features</span><span class="o">.</span><span class="na">minMaxScaler</span><span class="o">(</span><span class="n">xPoly</span><span class="o">).</span><span class="na">apply</span><span class="o">(</span><span class="n">xPoly</span><span class="o">)</span>
</span><span id="__span-10-6"><a id="__codelineno-10-6" name="__codelineno-10-6" href="#__codelineno-10-6"></a>
</span><span id="__span-10-7"><a id="__codelineno-10-7" name="__codelineno-10-7" href="#__codelineno-10-7"></a><span class="c1">// train test split (more data for training)</span>
</span><span id="__span-10-8"><a id="__codelineno-10-8" name="__codelineno-10-8" href="#__codelineno-10-8"></a><span class="kt">def</span><span class="w"> </span><span class="o">(</span>
</span><span id="__span-10-9"><a id="__codelineno-10-9" name="__codelineno-10-9" href="#__codelineno-10-9"></a><span class="w"> </span><span class="n">xTrain</span><span class="o">,</span>
</span><span id="__span-10-10"><a id="__codelineno-10-10" name="__codelineno-10-10" href="#__codelineno-10-10"></a><span class="w"> </span><span class="n">xTest</span><span class="o">,</span>
</span><span id="__span-10-11"><a id="__codelineno-10-11" name="__codelineno-10-11" href="#__codelineno-10-11"></a><span class="w"> </span><span class="n">yTrain</span><span class="o">,</span>
</span><span id="__span-10-12"><a id="__codelineno-10-12" name="__codelineno-10-12" href="#__codelineno-10-12"></a><span class="w"> </span><span class="n">yTest</span>
</span><span id="__span-10-13"><a id="__codelineno-10-13" name="__codelineno-10-13" href="#__codelineno-10-13"></a><span class="o">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ml</span><span class="o">.</span><span class="na">utils</span><span class="o">.</span><span class="na">trainTestSplit</span><span class="o">(</span><span class="n">xNormalized</span><span class="o">,</span><span class="w"> </span><span class="n">y</span><span class="o">)</span>
</span><span id="__span-10-4"><a id="__codelineno-10-4" name="__codelineno-10-4" href="#__codelineno-10-4"></a><span class="c1">// train test split (more data for training)</span>
</span><span id="__span-10-5"><a id="__codelineno-10-5" name="__codelineno-10-5" href="#__codelineno-10-5"></a><span class="kt">def</span><span class="w"> </span><span class="o">(</span>
</span><span id="__span-10-6"><a id="__codelineno-10-6" name="__codelineno-10-6" href="#__codelineno-10-6"></a><span class="w"> </span><span class="n">xTrain</span><span class="o">,</span>
</span><span id="__span-10-7"><a id="__codelineno-10-7" name="__codelineno-10-7" href="#__codelineno-10-7"></a><span class="w"> </span><span class="n">xTest</span><span class="o">,</span>
</span><span id="__span-10-8"><a id="__codelineno-10-8" name="__codelineno-10-8" href="#__codelineno-10-8"></a><span class="w"> </span><span class="n">yTrain</span><span class="o">,</span>
</span><span id="__span-10-9"><a id="__codelineno-10-9" name="__codelineno-10-9" href="#__codelineno-10-9"></a><span class="w"> </span><span class="n">yTest</span>
</span><span id="__span-10-10"><a id="__codelineno-10-10" name="__codelineno-10-10" href="#__codelineno-10-10"></a><span class="o">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ml</span><span class="o">.</span><span class="na">utils</span><span class="o">.</span><span class="na">trainTestSplit</span><span class="o">(</span><span class="n">xPoly</span><span class="o">,</span><span class="w"> </span><span class="n">y</span><span class="o">)</span>
</span><span id="__span-10-11"><a id="__codelineno-10-11" name="__codelineno-10-11" href="#__codelineno-10-11"></a>
</span><span id="__span-10-12"><a id="__codelineno-10-12" name="__codelineno-10-12" href="#__codelineno-10-12"></a><span class="c1">// creating and training model</span>
</span><span id="__span-10-13"><a id="__codelineno-10-13" name="__codelineno-10-13" href="#__codelineno-10-13"></a><span class="kt">def</span><span class="w"> </span><span class="n">model</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ml</span><span class="o">.</span><span class="na">regression</span><span class="o">.</span><span class="na">ols</span><span class="o">(</span><span class="n">xTrain</span><span class="o">,</span><span class="w"> </span><span class="n">yTrain</span><span class="o">)</span>
</span><span id="__span-10-14"><a id="__codelineno-10-14" name="__codelineno-10-14" href="#__codelineno-10-14"></a>
</span><span id="__span-10-15"><a id="__codelineno-10-15" name="__codelineno-10-15" href="#__codelineno-10-15"></a><span class="c1">// creating and training model</span>
</span><span id="__span-10-16"><a id="__codelineno-10-16" name="__codelineno-10-16" href="#__codelineno-10-16"></a><span class="kt">def</span><span class="w"> </span><span class="n">model</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ml</span><span class="o">.</span><span class="na">regression</span><span class="o">.</span><span class="na">ols</span><span class="o">(</span><span class="n">xTrain</span><span class="o">,</span><span class="w"> </span><span class="n">yTrain</span><span class="o">)</span>
</span><span id="__span-10-17"><a id="__codelineno-10-17" name="__codelineno-10-17" href="#__codelineno-10-17"></a>
</span><span id="__span-10-18"><a id="__codelineno-10-18" name="__codelineno-10-18" href="#__codelineno-10-18"></a><span class="c1">// predicting and getting r2_score for training and test sets</span>
</span><span id="__span-10-19"><a id="__codelineno-10-19" name="__codelineno-10-19" href="#__codelineno-10-19"></a><span class="kt">def</span><span class="w"> </span><span class="n">scoreTrain</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">model</span><span class="o">.</span><span class="na">score</span><span class="o">(</span><span class="n">xTrain</span><span class="o">,</span><span class="w"> </span><span class="n">yTrain</span><span class="o">).</span><span class="na">round</span><span class="o">(</span><span class="mi">6</span><span class="o">)</span>
</span><span id="__span-10-20"><a id="__codelineno-10-20" name="__codelineno-10-20" href="#__codelineno-10-20"></a><span class="kt">def</span><span class="w"> </span><span class="n">scoreTest</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">model</span><span class="o">.</span><span class="na">score</span><span class="o">(</span><span class="n">xTest</span><span class="o">,</span><span class="w"> </span><span class="n">yTest</span><span class="o">).</span><span class="na">round</span><span class="o">(</span><span class="mi">6</span><span class="o">)</span>
</span><span id="__span-10-21"><a id="__codelineno-10-21" name="__codelineno-10-21" href="#__codelineno-10-21"></a>
</span><span id="__span-10-22"><a id="__codelineno-10-22" name="__codelineno-10-22" href="#__codelineno-10-22"></a><span class="n">print</span><span class="o">(</span><span class="s2">&quot;train: ${scoreTrain}, test: ${scoreTest}&quot;</span><span class="o">)</span>
</span><span id="__span-10-15"><a id="__codelineno-10-15" name="__codelineno-10-15" href="#__codelineno-10-15"></a><span class="c1">// predicting and getting r2_score for training and test sets</span>
</span><span id="__span-10-16"><a id="__codelineno-10-16" name="__codelineno-10-16" href="#__codelineno-10-16"></a><span class="kt">def</span><span class="w"> </span><span class="n">scoreTrain</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">model</span><span class="o">.</span><span class="na">score</span><span class="o">(</span><span class="n">xTrain</span><span class="o">,</span><span class="w"> </span><span class="n">yTrain</span><span class="o">).</span><span class="na">round</span><span class="o">(</span><span class="mi">6</span><span class="o">)</span>
</span><span id="__span-10-17"><a id="__codelineno-10-17" name="__codelineno-10-17" href="#__codelineno-10-17"></a><span class="kt">def</span><span class="w"> </span><span class="n">scoreTest</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">model</span><span class="o">.</span><span class="na">score</span><span class="o">(</span><span class="n">xTest</span><span class="o">,</span><span class="w"> </span><span class="n">yTest</span><span class="o">).</span><span class="na">round</span><span class="o">(</span><span class="mi">6</span><span class="o">)</span>
</span><span id="__span-10-18"><a id="__codelineno-10-18" name="__codelineno-10-18" href="#__codelineno-10-18"></a>
</span><span id="__span-10-19"><a id="__codelineno-10-19" name="__codelineno-10-19" href="#__codelineno-10-19"></a><span class="n">print</span><span class="o">(</span><span class="s2">&quot;train: ${scoreTrain}, test: ${scoreTest}&quot;</span><span class="o">)</span>
</span></code></pre></div>
<div class="language-shell highlight"><span class="filename">output</span><pre><span></span><code><span id="__span-11-1"><a id="__codelineno-11-1" name="__codelineno-11-1" href="#__codelineno-11-1"></a>train:<span class="w"> </span><span class="m">0</span>.355879,<span class="w"> </span>test:<span class="w"> </span><span class="m">0</span>.296751
</span></code></pre></div>
Expand Down Expand Up @@ -1282,13 +1277,12 @@ <h2 id="feature-selection">Feature selection</h2>
<div class="language-shell highlight"><span class="filename">output</span><pre><span></span><code><span id="__span-16-1"><a id="__codelineno-16-1" name="__codelineno-16-1" href="#__codelineno-16-1"></a><span class="o">[</span><span class="s1">&#39;atemp&#39;</span>,<span class="w"> </span><span class="s1">&#39;temp&#39;</span>,<span class="w"> </span><span class="s1">&#39;workingday&#39;</span>,<span class="w"> </span><span class="s1">&#39;season&#39;</span>,<span class="w"> </span><span class="s1">&#39;weekday&#39;</span><span class="o">]</span>
</span></code></pre></div>
<h2 id="regularization-and-normalization">Regularization and normalization</h2>
<p><strong>Regularization</strong></p>
<p><strong>Regularization</strong> is a technique used <strong>to reduce the model complexity</strong> and thus it helps dealing with over-fitting:</p>
<ul>
<li>It <strong>reduces the model size</strong> by shrinking the number of parameters the model has to learn </li>
<li>It <strong>adds weight to the values</strong> so that it tries to favor smaller values</li>
</ul>
<p><strong>Regularization</strong> penalizes certain values by using a loss function with a cost. This cost could be of type:</p>
<p>Regularization penalizes certain values by using a loss function with a cost. This cost could be of type:</p>
<ul>
<li><strong>L1</strong>: The cost is <strong>proportional to the absolute value</strong> of the weight coefficients (Lasso)</li>
<li><strong>L2</strong>: The cost is <strong>proportional to the square of the value</strong> of the weight coefficients (Ridge)</li>
Expand All @@ -1298,8 +1292,7 @@ <h2 id="regularization-and-normalization">Regularization and normalization</h2>
<p>Regularization really shines when there is a high dimensionality, meaning there’re multiple features. So in these
examples it won’t make a huge impact with the scores.</p>
</div>
<p><strong>Normalization</strong></p>
<p>Data normalization is the <strong>process of rescaling one or more features to a common scale</strong>. It’s normally used when features used to create the model have different scales. There are a few advantages of using normalization is such scenario:</p>
<p><strong>Data normalization</strong> is the <strong>process of rescaling one or more features to a common scale</strong>. It’s normally used when features used to create the model have different scales. There are a few advantages of using normalization is such scenario:</p>
<ul>
<li>It could improve the numerical stability of your model </li>
<li>It could speed up the training process</li>
Expand All @@ -1308,7 +1301,7 @@ <h2 id="regularization-and-normalization">Regularization and normalization</h2>
model feature adjustments.</p>
<div class="admonition tip">
<p class="admonition-title">Tip</p>
<p>Because in this article I’m only using ONE feature, normalization is not going to make much difference but, when
<p>When using only using ONE feature, normalization doesn't make much difference but, when
using multiple features, and each of them in different scales, then we should use normalization.</p>
</div>
<h2 id="regularization-baseline">Regularization Baseline</h2>
Expand All @@ -1318,13 +1311,13 @@ <h2 id="regularization-baseline">Regularization Baseline</h2>
</span><span id="__span-17-2"><a id="__codelineno-17-2" name="__codelineno-17-2" href="#__codelineno-17-2"></a><span class="kt">def</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">df</span><span class="o">[</span><span class="s1">&#39;registered&#39;</span><span class="o">]</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">double</span><span class="o">[]</span>
</span><span id="__span-17-3"><a id="__codelineno-17-3" name="__codelineno-17-3" href="#__codelineno-17-3"></a><span class="kt">def</span><span class="w"> </span><span class="n">ml</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Underdog</span><span class="o">.</span><span class="na">ml</span><span class="o">()</span>
</span><span id="__span-17-4"><a id="__codelineno-17-4" name="__codelineno-17-4" href="#__codelineno-17-4"></a>
</span><span id="__span-17-5"><a id="__codelineno-17-5" name="__codelineno-17-5" href="#__codelineno-17-5"></a><span class="c1">// train test split (more data for training)</span>
</span><span id="__span-17-5"><a id="__codelineno-17-5" name="__codelineno-17-5" href="#__codelineno-17-5"></a><span class="c1">// train test split</span>
</span><span id="__span-17-6"><a id="__codelineno-17-6" name="__codelineno-17-6" href="#__codelineno-17-6"></a><span class="kt">def</span><span class="w"> </span><span class="o">(</span><span class="n">xTrain</span><span class="o">,</span><span class="w"> </span><span class="n">xTest</span><span class="o">,</span><span class="w"> </span><span class="n">yTrain</span><span class="o">,</span><span class="w"> </span><span class="n">yTest</span><span class="o">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ml</span><span class="o">.</span><span class="na">utils</span><span class="o">.</span><span class="na">trainTestSplit</span><span class="o">(</span><span class="n">X</span><span class="o">,</span><span class="w"> </span><span class="n">y</span><span class="o">)</span>
</span><span id="__span-17-7"><a id="__codelineno-17-7" name="__codelineno-17-7" href="#__codelineno-17-7"></a>
</span><span id="__span-17-8"><a id="__codelineno-17-8" name="__codelineno-17-8" href="#__codelineno-17-8"></a><span class="c1">// creating and training model (RIDGE)</span>
</span><span id="__span-17-8"><a id="__codelineno-17-8" name="__codelineno-17-8" href="#__codelineno-17-8"></a><span class="c1">// creating and training model</span>
</span><span id="__span-17-9"><a id="__codelineno-17-9" name="__codelineno-17-9" href="#__codelineno-17-9"></a><span class="kt">def</span><span class="w"> </span><span class="n">model</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ml</span><span class="o">.</span><span class="na">regression</span><span class="o">.</span><span class="na">ols</span><span class="o">(</span><span class="n">xTrain</span><span class="o">,</span><span class="w"> </span><span class="n">yTrain</span><span class="o">)</span>
</span><span id="__span-17-10"><a id="__codelineno-17-10" name="__codelineno-17-10" href="#__codelineno-17-10"></a>
</span><span id="__span-17-11"><a id="__codelineno-17-11" name="__codelineno-17-11" href="#__codelineno-17-11"></a><span class="c1">// predicting and getting r2_score for training and test sets</span>
</span><span id="__span-17-11"><a id="__codelineno-17-11" name="__codelineno-17-11" href="#__codelineno-17-11"></a><span class="c1">// getting scores</span>
</span><span id="__span-17-12"><a id="__codelineno-17-12" name="__codelineno-17-12" href="#__codelineno-17-12"></a><span class="kt">def</span><span class="w"> </span><span class="n">scoreTrain</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">model</span><span class="o">.</span><span class="na">score</span><span class="o">(</span><span class="n">xTrain</span><span class="o">,</span><span class="w"> </span><span class="n">yTrain</span><span class="o">).</span><span class="na">round</span><span class="o">(</span><span class="mi">6</span><span class="o">)</span>
</span><span id="__span-17-13"><a id="__codelineno-17-13" name="__codelineno-17-13" href="#__codelineno-17-13"></a><span class="kt">def</span><span class="w"> </span><span class="n">scoreTest</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">model</span><span class="o">.</span><span class="na">score</span><span class="o">(</span><span class="n">xTest</span><span class="o">,</span><span class="w"> </span><span class="n">yTest</span><span class="o">).</span><span class="na">round</span><span class="o">(</span><span class="mi">6</span><span class="o">)</span>
</span><span id="__span-17-14"><a id="__codelineno-17-14" name="__codelineno-17-14" href="#__codelineno-17-14"></a>
Expand Down
2 changes: 1 addition & 1 deletion search/search_index.json

Large diffs are not rendered by default.

Loading

0 comments on commit a290a24

Please sign in to comment.