Skip to content

Commit

Permalink
deploy: dccdefc
Browse files Browse the repository at this point in the history
  • Loading branch information
baniasbaabe committed Feb 4, 2024
1 parent 7d2451f commit e10541d
Show file tree
Hide file tree
Showing 9 changed files with 261 additions and 2 deletions.
39 changes: 39 additions & 0 deletions _sources/book/cooltools/Chapter.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1665,6 +1665,45 @@
" }\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## SQL Query Builder in Python"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can build SQL queries in Python with pypika.\n",
"\n",
"pypika provides a simple interface to build SQL queries with an easy syntax.\n",
"\n",
"It supports nearly every SQL command."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from pypika import Tables, Query\n",
"\n",
"history, customers = Tables('history', 'customers')\n",
"q = Query \\\n",
" .from_(history) \\\n",
" .join(customers) \\\n",
" .on(history.customer_id == customers.id) \\\n",
" .select(history.star) \\\n",
" .where(customers.id == 5)\n",
" \n",
"q.get_sql()\n",
"# SELECT \"history\".* FROM \"history\" JOIN \"customers\" \n",
"# ON \"history\".\"customer_id\"=\"customers\".\"id\" WHERE \"customers\".\"id\"=5"
]
}
],
"metadata": {
Expand Down
59 changes: 59 additions & 0 deletions _sources/book/machinelearning/outlierdetection.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,65 @@
" \n",
"majority_vote(labels)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Robust Outlier Detection with `puncc`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Outlier Detection is notoriously hard.\n",
"\n",
"But it doesn't have to.\n",
"\n",
"`puncc` offers outlier detection, powered by Conformal Prediction, where the detection threshold will be calibrated.\n",
"\n",
"So, false alarms are reduced."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install puncc"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.ensemble import IsolationForest\n",
"from deel.puncc.anomaly_detection import SplitCAD\n",
"from deel.puncc.api.prediction import BasePredictor\n",
"\n",
"# We need to redefine the predict to output the nonconformity scores.\n",
"class ADPredictor(BasePredictor):\n",
" def predict(self, X):\n",
" return -self.model.score_samples(X)\n",
"\n",
"# Wrap Isolation Forest in a predictor\n",
"if_predictor = ADPredictor(IsolationForest())\n",
"\n",
"# Instantiate CAD on top of IF predictor\n",
"if_cad = SplitCAD(if_predictor, train=True)\n",
"\n",
"\n",
"if_cad.fit(z=dataset, fit_ratio=0.7)\n",
"\n",
"# Maximum false detection rate\n",
"alpha = 0.01\n",
"\n",
"results = if_cad.predict(new_data, alpha=alpha)"
]
}
],
"metadata": {
Expand Down
48 changes: 48 additions & 0 deletions _sources/book/pandas/Chapter.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,54 @@
"data = {'Value': [1.2343129, 5.8956701, 6.224289]}\n",
"df = pd.DataFrame(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Faster I/O with Parquet"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Whenever you work with bigger datasets, please avoid using CSV format (or similar).\n",
"\n",
"CSV files are text files, which are human-readable, and therefore a popular option to store data.\n",
"\n",
"For small datasets, this is not a big issue.\n",
"\n",
"But, what if your data has millions of rows?\n",
"\n",
"It can get really slow to do read/write operations on them.\n",
"\n",
"On the other side, binary files exist too.\n",
"\n",
"They consist of 0s and 1s and are not meant to be human-readable but to be used by programs that know how to interpret them.\n",
"\n",
"Because of that, binary files are more compact and consume less space.\n",
"\n",
"Parquet is one popular binary file format, which is more memory-efficient than CSVs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"\n",
"# Shape: (100000000, 5)\n",
"df = pd.DataFrame(...)\n",
"\n",
"# Time: 1m 58s\n",
"df.to_csv(\"data.csv\")\n",
"\n",
"# Time: 8s\n",
"df.to_parquet(\"data.parquet\")"
]
}
],
"metadata": {
Expand Down
10 changes: 9 additions & 1 deletion _sources/book/pythontricks/Chapter.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -961,7 +961,15 @@
{
"cell_type": "markdown",
"metadata": {},
"source": []
"source": [
"One cool feature in Python 3.12:\n",
"\n",
"The support for Type Variables.\n",
"\n",
"You can use them to parametrize generic classes and functions.\n",
"\n",
"See below for a small example where our generic class is parametrized by T which we indicate with [T]."
]
},
{
"cell_type": "code",
Expand Down
27 changes: 27 additions & 0 deletions book/cooltools/Chapter.html
Original file line number Diff line number Diff line change
Expand Up @@ -449,6 +449,7 @@ <h2> Contents </h2>
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#better-alternative-to-requests">2.1.32. Better Alternative to <code class="docutils literal notranslate"><span class="pre">requests</span></code></a></li>
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#managing-configurations-with-python-dotenv">2.1.33. Managing Configurations with <code class="docutils literal notranslate"><span class="pre">python-dotenv</span></code></a></li>
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#work-with-notion-via-python-with">2.1.34. Work with Notion via Python with</a></li>
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#sql-query-builder-in-python">2.1.35. SQL Query Builder in Python</a></li>
</ul>
</nav>
</div>
Expand Down Expand Up @@ -1462,6 +1463,31 @@ <h2><span class="section-number">2.1.34. </span>Work with Notion via Python with
</div>
</div>
</section>
<section id="sql-query-builder-in-python">
<h2><span class="section-number">2.1.35. </span>SQL Query Builder in Python<a class="headerlink" href="#sql-query-builder-in-python" title="Permalink to this heading">#</a></h2>
<p>You can build SQL queries in Python with pypika.</p>
<p>pypika provides a simple interface to build SQL queries with an easy syntax.</p>
<p>It supports nearly every SQL command.</p>
<div class="cell docutils container">
<div class="cell_input docutils container">
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pypika</span> <span class="kn">import</span> <span class="n">Tables</span><span class="p">,</span> <span class="n">Query</span>

<span class="n">history</span><span class="p">,</span> <span class="n">customers</span> <span class="o">=</span> <span class="n">Tables</span><span class="p">(</span><span class="s1">&#39;history&#39;</span><span class="p">,</span> <span class="s1">&#39;customers&#39;</span><span class="p">)</span>
<span class="n">q</span> <span class="o">=</span> <span class="n">Query</span> \
<span class="o">.</span><span class="n">from_</span><span class="p">(</span><span class="n">history</span><span class="p">)</span> \
<span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">customers</span><span class="p">)</span> \
<span class="o">.</span><span class="n">on</span><span class="p">(</span><span class="n">history</span><span class="o">.</span><span class="n">customer_id</span> <span class="o">==</span> <span class="n">customers</span><span class="o">.</span><span class="n">id</span><span class="p">)</span> \
<span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="n">history</span><span class="o">.</span><span class="n">star</span><span class="p">)</span> \
<span class="o">.</span><span class="n">where</span><span class="p">(</span><span class="n">customers</span><span class="o">.</span><span class="n">id</span> <span class="o">==</span> <span class="mi">5</span><span class="p">)</span>

<span class="n">q</span><span class="o">.</span><span class="n">get_sql</span><span class="p">()</span>
<span class="c1"># SELECT &quot;history&quot;.* FROM &quot;history&quot; JOIN &quot;customers&quot; </span>
<span class="c1"># ON &quot;history&quot;.&quot;customer_id&quot;=&quot;customers&quot;.&quot;id&quot; WHERE &quot;customers&quot;.&quot;id&quot;=5</span>
</pre></div>
</div>
</div>
</div>
</section>
</section>

<script type="text/x-thebe-config">
Expand Down Expand Up @@ -1565,6 +1591,7 @@ <h2><span class="section-number">2.1.34. </span>Work with Notion via Python with
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#better-alternative-to-requests">2.1.32. Better Alternative to <code class="docutils literal notranslate"><span class="pre">requests</span></code></a></li>
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#managing-configurations-with-python-dotenv">2.1.33. Managing Configurations with <code class="docutils literal notranslate"><span class="pre">python-dotenv</span></code></a></li>
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#work-with-notion-via-python-with">2.1.34. Work with Notion via Python with</a></li>
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#sql-query-builder-in-python">2.1.35. SQL Query Builder in Python</a></li>
</ul>
</nav></div>

Expand Down
44 changes: 44 additions & 0 deletions book/machinelearning/outlierdetection.html
Original file line number Diff line number Diff line change
Expand Up @@ -416,6 +416,7 @@ <h2> Contents </h2>
<nav aria-label="Page">
<ul class="visible nav section-nav flex-column">
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#ensembling-for-outlier-detection">5.6.1. Ensembling for Outlier Detection</a></li>
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#robust-outlier-detection-with-puncc">5.6.2. Robust Outlier Detection with <code class="docutils literal notranslate"><span class="pre">puncc</span></code></a></li>
</ul>
</nav>
</div>
Expand Down Expand Up @@ -486,6 +487,48 @@ <h2><span class="section-number">5.6.1. </span>Ensembling for Outlier Detection<
</div>
</div>
</section>
<section id="robust-outlier-detection-with-puncc">
<h2><span class="section-number">5.6.2. </span>Robust Outlier Detection with <code class="docutils literal notranslate"><span class="pre">puncc</span></code><a class="headerlink" href="#robust-outlier-detection-with-puncc" title="Permalink to this heading">#</a></h2>
<p>Outlier Detection is notoriously hard.</p>
<p>But it doesn’t have to.</p>
<p><code class="docutils literal notranslate"><span class="pre">puncc</span></code> offers outlier detection, powered by Conformal Prediction, where the detection threshold will be calibrated.</p>
<p>So, false alarms are reduced.</p>
<div class="cell docutils container">
<div class="cell_input docutils container">
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span>!pip install puncc
</pre></div>
</div>
</div>
</div>
<div class="cell docutils container">
<div class="cell_input docutils container">
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">sklearn.ensemble</span> <span class="kn">import</span> <span class="n">IsolationForest</span>
<span class="kn">from</span> <span class="nn">deel.puncc.anomaly_detection</span> <span class="kn">import</span> <span class="n">SplitCAD</span>
<span class="kn">from</span> <span class="nn">deel.puncc.api.prediction</span> <span class="kn">import</span> <span class="n">BasePredictor</span>

<span class="c1"># We need to redefine the predict to output the nonconformity scores.</span>
<span class="k">class</span> <span class="nc">ADPredictor</span><span class="p">(</span><span class="n">BasePredictor</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">predict</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">):</span>
<span class="k">return</span> <span class="o">-</span><span class="bp">self</span><span class="o">.</span><span class="n">model</span><span class="o">.</span><span class="n">score_samples</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>

<span class="c1"># Wrap Isolation Forest in a predictor</span>
<span class="n">if_predictor</span> <span class="o">=</span> <span class="n">ADPredictor</span><span class="p">(</span><span class="n">IsolationForest</span><span class="p">())</span>

<span class="c1"># Instantiate CAD on top of IF predictor</span>
<span class="n">if_cad</span> <span class="o">=</span> <span class="n">SplitCAD</span><span class="p">(</span><span class="n">if_predictor</span><span class="p">,</span> <span class="n">train</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>


<span class="n">if_cad</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">z</span><span class="o">=</span><span class="n">dataset</span><span class="p">,</span> <span class="n">fit_ratio</span><span class="o">=</span><span class="mf">0.7</span><span class="p">)</span>

<span class="c1"># Maximum false detection rate</span>
<span class="n">alpha</span> <span class="o">=</span> <span class="mf">0.01</span>

<span class="n">results</span> <span class="o">=</span> <span class="n">if_cad</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">new_data</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="n">alpha</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</section>
</section>

<script type="text/x-thebe-config">
Expand Down Expand Up @@ -556,6 +599,7 @@ <h2><span class="section-number">5.6.1. </span>Ensembling for Outlier Detection<
<nav class="bd-toc-nav page-toc">
<ul class="visible nav section-nav flex-column">
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#ensembling-for-outlier-detection">5.6.1. Ensembling for Outlier Detection</a></li>
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#robust-outlier-detection-with-puncc">5.6.2. Robust Outlier Detection with <code class="docutils literal notranslate"><span class="pre">puncc</span></code></a></li>
</ul>
</nav></div>

Expand Down
30 changes: 30 additions & 0 deletions book/pandas/Chapter.html
Original file line number Diff line number Diff line change
Expand Up @@ -420,6 +420,7 @@ <h2> Contents </h2>
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#change-the-plotting-backend">8.1.3. Change the Plotting Backend</a></li>
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#style-your-dataframes">8.1.4. Style your DataFrames</a></li>
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#set-precision-of-displayed-floats">8.1.5. Set Precision of Displayed Floats</a></li>
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#faster-i-o-with-parquet">8.1.6. Faster I/O with Parquet</a></li>
</ul>
</nav>
</div>
Expand Down Expand Up @@ -534,6 +535,34 @@ <h2><span class="section-number">8.1.5. </span>Set Precision of Displayed Floats
</div>
</div>
</section>
<section id="faster-i-o-with-parquet">
<h2><span class="section-number">8.1.6. </span>Faster I/O with Parquet<a class="headerlink" href="#faster-i-o-with-parquet" title="Permalink to this heading">#</a></h2>
<p>Whenever you work with bigger datasets, please avoid using CSV format (or similar).</p>
<p>CSV files are text files, which are human-readable, and therefore a popular option to store data.</p>
<p>For small datasets, this is not a big issue.</p>
<p>But, what if your data has millions of rows?</p>
<p>It can get really slow to do read/write operations on them.</p>
<p>On the other side, binary files exist too.</p>
<p>They consist of 0s and 1s and are not meant to be human-readable but to be used by programs that know how to interpret them.</p>
<p>Because of that, binary files are more compact and consume less space.</p>
<p>Parquet is one popular binary file format, which is more memory-efficient than CSVs.</p>
<div class="cell docutils container">
<div class="cell_input docutils container">
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>

<span class="c1"># Shape: (100000000, 5)</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>

<span class="c1"># Time: 1m 58s</span>
<span class="n">df</span><span class="o">.</span><span class="n">to_csv</span><span class="p">(</span><span class="s2">&quot;data.csv&quot;</span><span class="p">)</span>

<span class="c1"># Time: 8s</span>
<span class="n">df</span><span class="o">.</span><span class="n">to_parquet</span><span class="p">(</span><span class="s2">&quot;data.parquet&quot;</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</section>
</section>

<script type="text/x-thebe-config">
Expand Down Expand Up @@ -608,6 +637,7 @@ <h2><span class="section-number">8.1.5. </span>Set Precision of Displayed Floats
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#change-the-plotting-backend">8.1.3. Change the Plotting Backend</a></li>
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#style-your-dataframes">8.1.4. Style your DataFrames</a></li>
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#set-precision-of-displayed-floats">8.1.5. Set Precision of Displayed Floats</a></li>
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#faster-i-o-with-parquet">8.1.6. Faster I/O with Parquet</a></li>
</ul>
</nav></div>

Expand Down
4 changes: 4 additions & 0 deletions book/pythontricks/Chapter.html
Original file line number Diff line number Diff line change
Expand Up @@ -1014,6 +1014,10 @@ <h2><span class="section-number">10.1.23. </span>Modify Print Statements<a class
</section>
<section id="type-variables-in-python-3-12">
<h2><span class="section-number">10.1.24. </span>Type Variables in Python 3.12<a class="headerlink" href="#type-variables-in-python-3-12" title="Permalink to this heading">#</a></h2>
<p>One cool feature in Python 3.12:</p>
<p>The support for Type Variables.</p>
<p>You can use them to parametrize generic classes and functions.</p>
<p>See below for a small example where our generic class is parametrized by T which we indicate with [T].</p>
<div class="cell docutils container">
<div class="cell_input docutils container">
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Stack</span><span class="p">[</span><span class="n">T</span><span class="p">]:</span>
Expand Down
2 changes: 1 addition & 1 deletion searchindex.js

Large diffs are not rendered by default.

0 comments on commit e10541d

Please sign in to comment.