From 427a6d660f025770457f9673d528cec2af7b342d Mon Sep 17 00:00:00 2001 From: baniasbaabe Date: Sun, 28 Apr 2024 17:11:16 +0000 Subject: [PATCH] deploy: 9cd50844e8ac1e67cb57a8864103253f7b56a238 --- _sources/book/llm/Chapter.ipynb | 42 +++++++++++++++++++ .../machinelearning/featureselection.ipynb | 12 +++--- _sources/book/polars/Chapter.ipynb | 41 ++++++++++++++++++ book/llm/Chapter.html | 28 +++++++++++++ book/machinelearning/featureselection.html | 26 ++++-------- book/polars/Chapter.html | 27 ++++++++++++ searchindex.js | 2 +- 7 files changed, 151 insertions(+), 27 deletions(-) diff --git a/_sources/book/llm/Chapter.ipynb b/_sources/book/llm/Chapter.ipynb index 69cfe81..37dc1f2 100644 --- a/_sources/book/llm/Chapter.ipynb +++ b/_sources/book/llm/Chapter.ipynb @@ -244,6 +244,48 @@ "\n", "print(json.dumps(results, indent=3))" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Embed Any Type of File" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "These days, everything is about Embeddings and LLMs.\n", + "\n", + "The Python library `embed-anything` makes it easy to generate embeddings from multiple sources like image, video, or audio.\n", + "\n", + "It's built in Rust so it executes fast." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install embed-anything" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import embed_anything\n", + "\n", + "data = embed_anything.embed_file(\"filename.pdf\", embeder= \"Bert\")\n", + "embeddings = np.array([data.embedding for data in data])\n", + "\n", + "data = embed_anything.embed_directory(\"test_files\", embeder= \"Clip\")\n", + "embeddings = np.array([data.embedding for data in data])" + ] } ], "metadata": { diff --git a/_sources/book/machinelearning/featureselection.ipynb b/_sources/book/machinelearning/featureselection.ipynb index fc75600..dfc70a1 100644 --- a/_sources/book/machinelearning/featureselection.ipynb +++ b/_sources/book/machinelearning/featureselection.ipynb @@ -247,20 +247,18 @@ ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": {}, - "outputs": [], "source": [ "Do you want to do Feature Selection automatically?\n", "\n", - "Try mrmr.\n", + "Try `mrmr`.\n", "\n", - "mrmr (minimum-Redundancy-Maximum-Relevance) is a minimal-optimal feature selection algorithm at scale.\n", + "`mrmr` (minimum-Redundancy-Maximum-Relevance) is a minimal-optimal feature selection algorithm at scale.\n", "\n", - "It means mrmr will find the smallest relevant subset of features your ML Model needs.\n", + "It means `mrmr` will find the smallest relevant subset of features your ML Model needs.\n", "\n", - "mrmr supports common tools like Pandas, Polars and Spark.\n", + "`mrmr` supports common tools like Pandas, Polars and Spark.\n", "\n", "See below how we want to select the best K features.\n", "\n", diff --git a/_sources/book/polars/Chapter.ipynb b/_sources/book/polars/Chapter.ipynb index 970a6d8..edb02bf 100644 --- a/_sources/book/polars/Chapter.ipynb +++ b/_sources/book/polars/Chapter.ipynb @@ -64,6 +64,47 @@ " pl.col(\"actual\").num_ext.binary_metrics_combo(pl.col(\"predicted\")).alias(\"combo\")\n", ").unnest(\"combo\")" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Plugin for Fitting Linear Models" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In Polars, you can fit linear models with the `polars-ols` extension.\n", + "\n", + "You can use ordinary, weighted or regularized least squares like Lasso or Elastic Net.\n", + "\n", + "It can be 2x-88x times faster than popular libraries like sklearn or statsmodels." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install polars-ols" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import polars as pl\n", + "import polars_ols as pls\n", + "\n", + "lasso_expr = pl.col(\"y\").least_squares.lasso(\"x1\", \"x2\", alpha=0.0001, add_intercept=True).over(\"group\")\n", + "\n", + "predictions = df.with_columns(lasso_expr.round(2).alias(\"predictions_lasso\"))" + ] } ], "metadata": { diff --git a/book/llm/Chapter.html b/book/llm/Chapter.html index 5d467d3..d99b455 100644 --- a/book/llm/Chapter.html +++ b/book/llm/Chapter.html @@ -425,6 +425,7 @@

Contents

  • 6.1.2. One-Function Call to Any LLM with litellm
  • 6.1.3. Safeguard Your LLMs with LLMGuard
  • 6.1.4. Evaluate LLMs with uptrain
  • +
  • 6.1.5. Embed Any Type of File
  • @@ -605,6 +606,32 @@

    6.1.4. Evaluate LLMs with +

    6.1.5. Embed Any Type of File#

    +

    These days, everything is about Embeddings and LLMs.

    +

    The Python library embed-anything makes it easy to generate embeddings from multiple sources like image, video, or audio.

    +

    It’s built in Rust so it executes fast.

    +
    +
    +
    !pip install embed-anything
    +
    +
    +
    +
    +
    +
    +
    import embed_anything
    +
    +data = embed_anything.embed_file("filename.pdf", embeder= "Bert")
    +embeddings = np.array([data.embedding for data in data])
    +
    +data = embed_anything.embed_directory("test_files", embeder= "Clip")
    +embeddings = np.array([data.embedding for data in data])
    +
    +
    +
    +
    +