diff --git a/docs/src/getting_started.ipynb b/docs/src/getting_started.ipynb index b01eebab9..b48013491 100644 --- a/docs/src/getting_started.ipynb +++ b/docs/src/getting_started.ipynb @@ -6,19 +6,23 @@ "id": "b8dd3ccc", "metadata": {}, "source": [ - "# Getting Started with Temporian\n", + "# Getting Started\n", "\n", "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/google/temporian/blob/last-release/docs/src/tutorials/getting_started.ipynb)\n", "\n", - "This guide will introduce you to the basics of Temporian, including:\n", - "- What is an **EventSet** and how to create one from scratch.\n", - "- Visualizing input/output data using **EventSet.plot()** and interactive plots.\n", - "- Converting back and forth between EventSets and pandas **DataFrames**.\n", - "- Transforming the EventSets by using **operators**.\n", - "- How operators work when using **indexes**.\n", - "- Commonly used operations like **glue**, **resample**, **lag**, moving windows and arithmetics.\n", + "Temporian is an open-source Python library for preprocessing and feature engineering temporal data, to get it ready for machine learning applications 🤖.\n", "\n", - "If you're interested in a topic that is not included here, we provide links to other parts of the documentation on the final section, to continue learning." + "This guide will introduce you to the basics of the library, including how to:\n", + "- Create an `EventSet` and use it.\n", + "- Visualize input/output data using `EventSet.plot()` and interactive plots.\n", + "- Convert back and forth between `EventSet` and pandas `DataFrame`.\n", + "- Transform an `EventSet` by using **operators**.\n", + "- Work with `indexes`.\n", + "- Use common operators like `glue`, `resample`, `lag`, moving windows and arithmetics.\n", + "\n", + "If you're interested in a topic that is not included here, we provide links to other parts of the documentation on the final section, to continue learning.\n", + "\n", + "By reading this guide, you will learn how to implement a processing pipeline with Temporian, to get your data ready to train machine learning models by using straightforward operations and avoiding common mistakes." ] }, { @@ -68,17 +72,47 @@ "source": [ "## Part 1: Events and EventSets\n", "\n", - "The most basic unit of data in Temporian is an **event**. An event consists of a timestamp and a set of feature values.\n", + "Events are the basic unit of data in Temporian. They consist of a timestamp and a set of feature values. Events are not handled individually, but are instead grouped together into **[`EventSets`](https://temporian.readthedocs.io/en/stable/user_guide/#events-and-eventsets)**.\n", + "\n", + "The main data structure in Temporian is the **[`EventSet`](https://temporian.readthedocs.io/en/stable/user_guide/#events-and-eventsets)**, and it represents **[multivariate and multi-index time sequences](https://temporian.readthedocs.io/en/stable/user_guide/#what-is-temporal-data)**. Let's break that down:\n", + "\n", + "- **multivariate:** indicates that each event in the time sequence holds several feature values.\n", + "- **multi-index:** indicates that the events can represent hierarchical data, and be therefore grouped by one or more of their features' values.\n", + "- **time sequence:** indicates that the events are not necessarily sampled at a uniform rate (in which case we would call it a *time series*).\n", "\n", - "Events are not handled individually. Instead, events are grouped together into an **`EventSet`**.\n", + "You can create an `EventSet` from a pandas DataFrame, NumPy arrays, CSV files, and more. Here is an example containing only 3 events and 2 features:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9cfb0cb1", + "metadata": {}, + "outputs": [], + "source": [ + "evset = tp.event_set(\n", + " timestamps=[1, 2, 3],\n", + " features={\n", + " \"feature_1\": [10, 20, 30],\n", + " \"feature_2\": [False, False, True],\n", + " },\n", + ")\n", + "evset" + ] + }, + { + "cell_type": "markdown", + "id": "c8267798", + "metadata": {}, + "source": [ + "An `EventSet` can hold one or several time sequences, depending on its index.\n", "\n", - "`EventSets` are the main data structures in Temporian, and represent **[multivariate and multi-index time sequences](../user_guide/#what-is-temporal-data)**. Let's break that down:\n", + "- If it has no index (e.g: above case), an `EventSet` holds a single multivariate time sequence.\n", + "- If it has one (or more) indexes, the events are grouped by their index values. This means that the `EventSet` will hold one multivariate time sequence for each unique value (or unique combination of values) of its indexes.\n", "\n", - "- \"multivariate\" indicates that each event in the time sequence holds several feature values.\n", - "- \"multi-index\" indicates that the events can represent hierarchical data, and be therefore grouped by one or more of their features' values.\n", - "- \"sequence\" indicates that the events are not necessarily sampled at a uniform rate (in which case we would call it a time \"series\").\n", + "Operators are applied on each time sequence of an `EventSet` independently. Indexing is the primary way to handle rich and complex databases. For instance, in a retail database, you can index on customers, stores, products, etc.\n", "\n", - "You can create an `EventSet` from a pandas DataFrame, NumPy arrays, CSV files, and more. Here is an example containing four events and three features (one of which is used as an `index`):" + "The following example will create one sequence for `blue` events, and another one for `red` ones, by specifying that one of the features is an `index`:" ] }, { @@ -88,6 +122,7 @@ "metadata": {}, "outputs": [], "source": [ + "# EventSet with indexes\n", "evset = tp.event_set(\n", " timestamps=[\"2023-02-04\", \"2023-02-06\", \"2023-02-07\", \"2023-02-07\"],\n", " features={\n", @@ -105,12 +140,6 @@ "id": "effc4483-9a1a-4e21-b376-3ed188ced821", "metadata": {}, "source": [ - "An `EventSet` can hold one or several time sequences, depending on what its `index` is.\n", - "\n", - "If it has no index, it will hold a single multivariate time sequence, which means that all events will be considered part of the same group and will interact with each other when operators are applied.\n", - "\n", - "If it has one (or many) indexes, its events will be grouped by their `indexes` values, so it will hold one multivariate time sequence for each unique value (or unique combination of values) of its indexes, and operators will be applied to each time sequence independently.\n", - "\n", "See the last part of this tutorial to see some examples using `indexes` and operators." ] }, @@ -122,9 +151,9 @@ "source": [ "### Example Data\n", "\n", - "This minimal data consists of just one `signal` with a `timestamp` for each sample.\n", + "For the following examples, we will generate some fake data which consists of a `signal` with a `timestamp` for each sample.\n", "\n", - "The signal is a periodic sinusoidal `season` with a slight positive slope in the long run, which we call `trend`. Plus the ubiquitous `noise`." + "The signal is composed of a periodic `season` (sine wave), with a slight positive slope which we call `trend`. Plus the ubiquitous `noise`. We will include all these components as separate features, together with the resulting `signal`." ] }, { @@ -388,7 +417,7 @@ "### Exporting outputs from Temporian\n", "You may need to use this data in different ways for downstream tasks, like training a model using whatever library you need. \n", "\n", - "If you can't use the data directly from Temporian, you can always go back to a pandas DataFrame:" + "If you can't use the data directly from Temporian, you can always go back to a pandas `DataFrame`:" ] }, { @@ -514,11 +543,11 @@ "## Summary\n", "\n", "Congratulations! You now have the basic concepts needed to create a data preprocessing pipeline with Temporian:\n", - "- Defining an **EventSet** and using **operators** on it.\n", - "- Combine **features** using **select** and **glue**.\n", - "- Coverting data back and forth between Temporian's **EventSet** and pandas **DataFrames**.\n", - "- Visualizing input/output data using **EventSet.plot()**.\n", - "- Operating and plotting with an **index**.\n", + "- Defining an `EventSet` and using **operators** on it.\n", + "- Combine features using `select` and `glue`.\n", + "- Converting data back and forth between Temporian's `EventSet` and pandas `DataFrames`.\n", + "- Visualizing input/output data using `EventSet.plot()`.\n", + "- Operating and plotting with `indexes`.\n", "\n", "### Other important details\n", "\n", @@ -536,6 +565,12 @@ "- We could only cover a small fraction of **[all available operators](https://temporian.readthedocs.io/en/stable/reference/temporian/operators/add_index/)**.\n", "- We put a lot of ❤️ in the **[User Guide](https://temporian.readthedocs.io/en/stable/user_guide/)**, so make sure to check it out 🙂." ] + }, + { + "cell_type": "markdown", + "id": "cebffed7", + "metadata": {}, + "source": [] } ], "metadata": { diff --git a/docs/src/user_guide.ipynb b/docs/src/user_guide.ipynb index ca7602e5a..6a018928d 100644 --- a/docs/src/user_guide.ipynb +++ b/docs/src/user_guide.ipynb @@ -8,7 +8,7 @@ "source": [ "# User Guide\n", "\n", - "This is a complete tour of Temporian's capabilities. For a quick hands-on overview, make sure to check the [Getting started guide](./getting_started)." + "This is a complete tour of Temporian's capabilities. For a quick hands-on overview, make sure to check the [Getting started guide](https://temporian.readthedocs.io/en/stable/getting_started)." ] }, { @@ -187,7 +187,7 @@ "result = evset.simple_moving_average(window_length=1)\n", "\n", "# Plot the results\n", - "tp.plot([evset, result]) " + "tp.plot([evset, result])" ] }, { @@ -1533,7 +1533,7 @@ " return sma_2 - sma_4\n", "\n", "result = my_function(evset)\n", - " \n", + "\n", "result.plot()" ] }, @@ -1557,7 +1557,7 @@ "sma_4_node = f1_node.simple_moving_average(window_length=4)\n", "result_node = sma_2_node - sma_4_node\n", "result = tp.run(result_node, {input_node: evset}, verbose=1)\n", - " \n", + "\n", "result.plot()" ] }, @@ -1801,7 +1801,7 @@ " features=a_evset.schema.features,\n", " indexes=a_evset.schema.indexes\n", ")\n", - " \n", + "\n", "a_node = a_evset.node()" ] },