Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: linked mkdocs & api docs #3703

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,10 @@ log/
.benchmarks

# docs autogen
/docs/source/api_docs/doc_gen/
docs/source/doc_gen/
docs/sphinx/_build/
docs/sphinx/source/doc_gen/
docs/site/

# Added by pyenv
.python-version
Expand Down
Binary file removed docs-v2/img/daft_diagram.png
Binary file not shown.
3 changes: 3 additions & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
sphinx/source/_build
site
mkdocs/api_docs
4 changes: 2 additions & 2 deletions docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build
SOURCEDIR = sphinx/source
BUILDDIR = sphinx/_build

# Put it first so that "make" without argument is like "make help".
help:
Expand Down
7 changes: 7 additions & 0 deletions docs/hooks.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
import shutil
import subprocess

def make_api_docs(*args, **kwargs):
subprocess.run(["make", "html"])
shutil.copytree("sphinx/_build/html", "mkdocs/api_docs", dirs_exist_ok=True)

13 changes: 11 additions & 2 deletions mkdocs.yml → docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
# Project Information
site_name: Daft Documentation

docs_dir: docs-v2
docs_dir: mkdocs

# Scarf pixel for tracking analytics
image:
Expand Down Expand Up @@ -46,7 +46,7 @@ nav:
- Telemetry: resources/telemetry.md
- Migration Guide:
- Coming from Dask: migration/dask_migration.md
- API Docs
- API Docs: api_docs/index.html

# Configuration
theme:
Expand Down Expand Up @@ -109,6 +109,11 @@ extra:
- icon: fontawesome/brands/x-twitter
link: https://x.com/daft_dataframe

# This is a macro you should use to refer to paths
# When referring to methods, the syntax is {{ api_path }}/path/to/method
api_path: /api_docs/doc_gen


# Extensions
markdown_extensions:
- admonition
Expand Down Expand Up @@ -138,3 +143,7 @@ plugins:
include_source: true
- search:
separator: '[\s\u200b\-_,:!=\[\]()"`/]+|\.(?!\d)|&[lg]t;|(?!\b)(?=[A-Z][a-z])'
- macros
- mkdocs-simple-hooks:
hooks:
on_pre_build: "docs.hooks:make_api_docs"
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ There are some options available to you.

3. Aggressively filter your data so that Daft can avoid reading data that it does not have to (e.g. `df.where(...)`)

4. Request more memory for your UDFs (see [Resource Requests](../core_concepts/udf.md#resource-requests) if your UDFs are memory intensive (e.g. decompression of data, running large matrix computations etc)
4. Request more memory for your UDFs (see [Resource Requests](../core_concepts.md#resource-requests) if your UDFs are memory intensive (e.g. decompression of data, running large matrix computations etc)

5. Increase the number of partitions in your dataframe (hence making each partition smaller) using something like: `df.into_partitions(df.num_partitions() * 2)`

Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes
File renamed without changes
Binary file added docs/mkdocs/img/daft_diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
File renamed without changes
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -40,12 +40,6 @@ First we run TPC-H 100 Scale Factor (around 100GB) benchmark on 4 i3.2xlarge wo

<!-- todo(doc): Find better way to embed html file content, rather than pasting the whole file, how to use snippet? -->

<!--
```python
--8<-- "docs-v2/resources/benchmarks/tpch-100sf.html"
```
snippet works, but does not execute block
-->
<div> <script type="text/javascript">window.PlotlyConfig = {MathJaxConfig: 'local'};</script>
<script charset="utf-8" src="https://cdn.plot.ly/plotly-2.20.0.min.js"></script> <div id="78330a19-a541-460b-bd9f-217b9d4cd137" class="plotly-graph-div" style="height:100%; width:100%;"></div> <script type="text/javascript"> window.PLOTLYENV=window.PLOTLYENV || {}; if (document.getElementById("78330a19-a541-460b-bd9f-217b9d4cd137")) { Plotly.newPlot( "78330a19-a541-460b-bd9f-217b9d4cd137", [{"marker":{"color":"rgba(108, 11, 169, 1)"},"name":"Daft","x":["Q1","Q2","Q3","Q4","Q5","Q6","Q7","Q8","Q9","Q10"],"y":[1.0666666666666667,0.7666666666666667,0.9833333333333333,1.05,1.9666666666666666,0.6333333333333333,1.1666666666666667,2.25,2.183333333333333,1.0166666666666666],"type":"bar","textposition":"inside"},{"hovertext":["5.6x Slower","1.1x Slower","5.1x Slower","2.8x Slower","2.0x Slower","9.7x Slower","4.3x Slower","2.0x Slower","2.3x Slower","4.8x Slower"],"marker":{"color":"rgba(226,90,28, 0.75)"},"name":"Spark","x":["Q1","Q2","Q3","Q4","Q5","Q6","Q7","Q8","Q9","Q10"],"y":[5.991666666666666,0.8716666666666666,4.996666666666667,2.955,3.8583333333333334,6.135000000000001,4.985,4.428333333333333,5.051666666666667,4.863333333333333],"type":"bar","textposition":"inside"},{"hovertext":["4.2x Slower","1.4x Slower","6.9x Slower","13.0x Slower","8.2x Slower","6.1x Slower","6.8x Slower","3.6x Slower","11.8x Slower","12.1x Slower"],"marker":{"color":"rgba(255,193,30, 0.75)"},"name":"Dask","x":["Q1","Q2","Q3","Q4","Q5","Q6","Q7","Q8","Q9","Q10"],"y":[4.456666666666666,1.0983333333333334,6.748333333333333,13.615,16.215,3.8366666666666664,7.96,8.148333333333333,25.790000000000003,12.306666666666667],"type":"bar","textposition":"inside"},{"hovertext":["29.1x Slower","12.5x Slower","nanx Slower","48.6x Slower","nanx Slower","87.7x Slower","nanx Slower","nanx Slower","nanx Slower","52.7x Slower"],"marker":{"color":"rgba(0,173,233, 0.6)"},"name":"Modin","x":["Q1","Q2","Q3","Q4","Q5","Q6","Q7","Q8","Q9","Q10"],"y":[31.066666666666666,9.616666666666667,null,51.05,null,55.53333333333333,null,null,null,53.6],"type":"bar","textposition":"inside"}], {"template":{"data":{"histogram2dcontour":[{"type":"histogram2dcontour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"choropleth":[{"type":"choropleth","colorbar":{"outlinewidth":0,"ticks":""}}],"histogram2d":[{"type":"histogram2d","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmap":[{"type":"heatmap","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmapgl":[{"type":"heatmapgl","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"contourcarpet":[{"type":"contourcarpet","colorbar":{"outlinewidth":0,"ticks":""}}],"contour":[{"type":"contour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"surface":[{"type":"surface","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"mesh3d":[{"type":"mesh3d","colorbar":{"outlinewidth":0,"ticks":""}}],"scatter":[{"fillpattern":{"fillmode":"overlay","size":10,"solidity":0.2},"type":"scatter"}],"parcoords":[{"type":"parcoords","line":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolargl":[{"type":"scatterpolargl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"bar":[{"error_x":{"color":"#2a3f5f"},"error_y":{"color":"#2a3f5f"},"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"bar"}],"scattergeo":[{"type":"scattergeo","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolar":[{"type":"scatterpolar","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"histogram":[{"marker":{"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"histogram"}],"scattergl":[{"type":"scattergl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatter3d":[{"type":"scatter3d","line":{"colorbar":{"outlinewidth":0,"ticks":""}},"marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattermapbox":[{"type":"scattermapbox","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterternary":[{"type":"scatterternary","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattercarpet":[{"type":"scattercarpet","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"carpet":[{"aaxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"baxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"type":"carpet"}],"table":[{"cells":{"fill":{"color":"#EBF0F8"},"line":{"color":"white"}},"header":{"fill":{"color":"#C8D4E3"},"line":{"color":"white"}},"type":"table"}],"barpolar":[{"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"barpolar"}],"pie":[{"automargin":true,"type":"pie"}]},"layout":{"autotypenumbers":"strict","colorway":["#636efa","#EF553B","#00cc96","#ab63fa","#FFA15A","#19d3f3","#FF6692","#B6E880","#FF97FF","#FECB52"],"font":{"color":"#2a3f5f"},"hovermode":"closest","hoverlabel":{"align":"left"},"paper_bgcolor":"white","plot_bgcolor":"#E5ECF6","polar":{"bgcolor":"#E5ECF6","angularaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"radialaxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"ternary":{"bgcolor":"#E5ECF6","aaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"baxis":{"gridcolor":"white","linecolor":"white","ticks":""},"caxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"coloraxis":{"colorbar":{"outlinewidth":0,"ticks":""}},"colorscale":{"sequential":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"sequentialminus":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"diverging":[[0,"#8e0152"],[0.1,"#c51b7d"],[0.2,"#de77ae"],[0.3,"#f1b6da"],[0.4,"#fde0ef"],[0.5,"#f7f7f7"],[0.6,"#e6f5d0"],[0.7,"#b8e186"],[0.8,"#7fbc41"],[0.9,"#4d9221"],[1,"#276419"]]},"xaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"yaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"scene":{"xaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"yaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"zaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2}},"shapedefaults":{"line":{"color":"#2a3f5f"}},"annotationdefaults":{"arrowcolor":"#2a3f5f","arrowhead":0,"arrowwidth":1},"geo":{"bgcolor":"white","landcolor":"#E5ECF6","subunitcolor":"white","showland":true,"showlakes":true,"lakecolor":"white"},"title":{"x":0.05},"mapbox":{"style":"light"}}},"title":{"text":"TPCH 100 Scale Factor - 4 Nodes (lower is better)"},"yaxis":{"title":{"text":"Time (minutes)"}},"xaxis":{"title":{"text":"TPCH Question"}},"uniformtext":{"minsize":8,"mode":"hide"}}, {"displayModeBar": false, "responsive": true} ) }; </script> </div>

Expand Down
File renamed without changes.
File renamed without changes.
10 changes: 9 additions & 1 deletion docs-v2/terms.md → docs/mkdocs/terms.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Terminology

!!! failure "todo(docs): Should the terms below include a link to its respective section under "Core Concepts"? (Except Query Plan doesn't have a section)"
!!! failure "todo(docs): For each term, included a link to its respective section under "Core Concepts" (except Query Plan doesn't have a section)"

Daft is a distributed data engine. The main abstraction in Daft is the [`DataFrame`](https://www.getdaft.io/projects/docs/en/stable/api_docs/doc_gen/dataframe_methods/daft.DataFrame.html#daft.DataFrame), which conceptually can be thought of as a "table" of data with rows and columns.

Expand All @@ -14,10 +14,14 @@ The [`DataFrame`](https://www.getdaft.io/projects/docs/en/stable/api_docs/doc_ge

Daft DataFrames are lazy. This means that calling most methods on a DataFrame will not execute that operation immediately - instead, DataFrames expose explicit methods such as [`daft.DataFrame.show`](https://www.getdaft.io/projects/docs/en/stable/api_docs/doc_gen/dataframe_methods/daft.DataFrame.show.html#daft.DataFrame.show) and [`daft.DataFrame.write_parquet`](https://www.getdaft.io/projects/docs/en/stable/api_docs/doc_gen/dataframe_methods/daft.DataFrame.write_parquet.html#daft.DataFrame.write_parquet) which will actually trigger computation of the DataFrame.

> Learn more at [DataFrame](core_concepts.md#dataframe)

## Expressions

An [`Expression`](https://www.getdaft.io/projects/docs/en/stable/api_docs/expressions.html) is a fundamental concept in Daft that allows you to define computations on DataFrame columns. They are the building blocks for transforming and manipulating data within your DataFrame and will be your best friend if you are working with Daft primarily using the Python API.

> Learn more at [Expressions](core_concepts.md#expressions)

## Query Plan

As mentioned earlier, Daft DataFrames are lazy. Under the hood, each DataFrame in Daft is represented by `LogicalPlan`, a plan of operations that describes how to compute that DataFrame. This plan is called the "query plan" and calling methods on the DataFrame actually adds steps to the query plan! When your DataFrame is executed, Daft will read this plan, optimize it to make it run faster and then execute it to compute the requested results.
Expand Down Expand Up @@ -83,9 +87,13 @@ You can examine a logical plan using [`df.explain()`](https://www.getdaft.io/pro
| Clustering spec = { Num partitions = 1 }
```

> Learn more at [Planning](resources/architecture.md#2-planning)

## Structured Query Language (SQL)

SQL is a common query language for expressing queries over tables of data. Daft exposes a SQL API as an alternative (but often also complementary API) to the Python [`DataFrame`](https://www.getdaft.io/projects/docs/en/stable/api_docs/doc_gen/dataframe_methods/daft.DataFrame.html#daft.DataFrame) and
[`Expression`](https://www.getdaft.io/projects/docs/en/stable/api_docs/expressions.html) APIs for building queries.

You can use SQL in Daft via the [`daft.sql()`](https://www.getdaft.io/projects/docs/en/stable/api_docs/sql.html#daft.sql) function, and Daft will also convert many SQL-compatible strings into Expressions via [`daft.sql_expr()`](https://www.getdaft.io/projects/docs/en/stable/api_docs/sql.html#daft.sql_expr) for easy interoperability with DataFrames.

> Learn more at [SQL](core_concepts.md#sql)
Loading
Loading