Skip to content

Commit

Permalink
Address PR comments on getting_started
Browse files Browse the repository at this point in the history
  • Loading branch information
DonBraulio committed Oct 31, 2023
1 parent 3275d15 commit 3d235c8
Show file tree
Hide file tree
Showing 2 changed files with 70 additions and 35 deletions.
95 changes: 65 additions & 30 deletions docs/src/getting_started.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,23 @@
"id": "b8dd3ccc",
"metadata": {},
"source": [
"# Getting Started with Temporian\n",
"# Getting Started\n",
"\n",
"[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/google/temporian/blob/last-release/docs/src/tutorials/getting_started.ipynb)\n",
"\n",
"This guide will introduce you to the basics of Temporian, including:\n",
"- What is an **EventSet** and how to create one from scratch.\n",
"- Visualizing input/output data using **EventSet.plot()** and interactive plots.\n",
"- Converting back and forth between EventSets and pandas **DataFrames**.\n",
"- Transforming the EventSets by using **operators**.\n",
"- How operators work when using **indexes**.\n",
"- Commonly used operations like **glue**, **resample**, **lag**, moving windows and arithmetics.\n",
"Temporian is an open-source Python library for preprocessing and feature engineering temporal data, to get it ready for machine learning applications 🤖.\n",
"\n",
"If you're interested in a topic that is not included here, we provide links to other parts of the documentation on the final section, to continue learning."
"This guide will introduce you to the basics of the library, including how to:\n",
"- Create an `EventSet` and use it.\n",
"- Visualize input/output data using `EventSet.plot()` and interactive plots.\n",
"- Convert back and forth between `EventSet` and pandas `DataFrame`.\n",
"- Transform an `EventSet` by using **operators**.\n",
"- Work with `indexes`.\n",
"- Use common operators like `glue`, `resample`, `lag`, moving windows and arithmetics.\n",
"\n",
"If you're interested in a topic that is not included here, we provide links to other parts of the documentation on the final section, to continue learning.\n",
"\n",
"By reading this guide, you will learn how to implement a processing pipeline with Temporian, to get your data ready to train machine learning models by using straightforward operations and avoiding common mistakes."
]
},
{
Expand Down Expand Up @@ -68,17 +72,47 @@
"source": [
"## Part 1: Events and EventSets\n",
"\n",
"The most basic unit of data in Temporian is an **event**. An event consists of a timestamp and a set of feature values.\n",
"Events are the basic unit of data in Temporian. They consist of a timestamp and a set of feature values. Events are not handled individually, but are instead grouped together into **[`EventSets`](https://temporian.readthedocs.io/en/stable/user_guide/#events-and-eventsets)**.\n",
"\n",
"The main data structure in Temporian is the **[`EventSet`](https://temporian.readthedocs.io/en/stable/user_guide/#events-and-eventsets)**, and it represents **[multivariate and multi-index time sequences](https://temporian.readthedocs.io/en/stable/user_guide/#what-is-temporal-data)**. Let's break that down:\n",
"\n",
"- **multivariate:** indicates that each event in the time sequence holds several feature values.\n",
"- **multi-index:** indicates that the events can represent hierarchical data, and be therefore grouped by one or more of their features' values.\n",
"- **time sequence:** indicates that the events are not necessarily sampled at a uniform rate (in which case we would call it a *time series*).\n",
"\n",
"Events are not handled individually. Instead, events are grouped together into an **`EventSet`**.\n",
"You can create an `EventSet` from a pandas DataFrame, NumPy arrays, CSV files, and more. Here is an example containing only 3 events and 2 features:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9cfb0cb1",
"metadata": {},
"outputs": [],
"source": [
"evset = tp.event_set(\n",
" timestamps=[1, 2, 3],\n",
" features={\n",
" \"feature_1\": [10, 20, 30],\n",
" \"feature_2\": [False, False, True],\n",
" },\n",
")\n",
"evset"
]
},
{
"cell_type": "markdown",
"id": "c8267798",
"metadata": {},
"source": [
"An `EventSet` can hold one or several time sequences, depending on its index.\n",
"\n",
"`EventSets` are the main data structures in Temporian, and represent **[multivariate and multi-index time sequences](../user_guide/#what-is-temporal-data)**. Let's break that down:\n",
"- If it has no index (e.g: above case), an `EventSet` holds a single multivariate time sequence.\n",
"- If it has one (or more) indexes, the events are grouped by their index values. This means that the `EventSet` will hold one multivariate time sequence for each unique value (or unique combination of values) of its indexes.\n",
"\n",
"- \"multivariate\" indicates that each event in the time sequence holds several feature values.\n",
"- \"multi-index\" indicates that the events can represent hierarchical data, and be therefore grouped by one or more of their features' values.\n",
"- \"sequence\" indicates that the events are not necessarily sampled at a uniform rate (in which case we would call it a time \"series\").\n",
"Operators are applied on each time sequence of an `EventSet` independently. Indexing is the primary way to handle rich and complex databases. For instance, in a retail database, you can index on customers, stores, products, etc.\n",
"\n",
"You can create an `EventSet` from a pandas DataFrame, NumPy arrays, CSV files, and more. Here is an example containing four events and three features (one of which is used as an `index`):"
"The following example will create one sequence for `blue` events, and another one for `red` ones, by specifying that one of the features is an `index`:"
]
},
{
Expand All @@ -88,6 +122,7 @@
"metadata": {},
"outputs": [],
"source": [
"# EventSet with indexes\n",
"evset = tp.event_set(\n",
" timestamps=[\"2023-02-04\", \"2023-02-06\", \"2023-02-07\", \"2023-02-07\"],\n",
" features={\n",
Expand All @@ -105,12 +140,6 @@
"id": "effc4483-9a1a-4e21-b376-3ed188ced821",
"metadata": {},
"source": [
"An `EventSet` can hold one or several time sequences, depending on what its `index` is.\n",
"\n",
"If it has no index, it will hold a single multivariate time sequence, which means that all events will be considered part of the same group and will interact with each other when operators are applied.\n",
"\n",
"If it has one (or many) indexes, its events will be grouped by their `indexes` values, so it will hold one multivariate time sequence for each unique value (or unique combination of values) of its indexes, and operators will be applied to each time sequence independently.\n",
"\n",
"See the last part of this tutorial to see some examples using `indexes` and operators."
]
},
Expand All @@ -122,9 +151,9 @@
"source": [
"### Example Data\n",
"\n",
"This minimal data consists of just one `signal` with a `timestamp` for each sample.\n",
"For the following examples, we will generate some fake data which consists of a `signal` with a `timestamp` for each sample.\n",
"\n",
"The signal is a periodic sinusoidal `season` with a slight positive slope in the long run, which we call `trend`. Plus the ubiquitous `noise`."
"The signal is composed of a periodic `season` (sine wave), with a slight positive slope which we call `trend`. Plus the ubiquitous `noise`. We will include all these components as separate features, together with the resulting `signal`."
]
},
{
Expand Down Expand Up @@ -388,7 +417,7 @@
"### Exporting outputs from Temporian\n",
"You may need to use this data in different ways for downstream tasks, like training a model using whatever library you need. \n",
"\n",
"If you can't use the data directly from Temporian, you can always go back to a pandas DataFrame:"
"If you can't use the data directly from Temporian, you can always go back to a pandas `DataFrame`:"
]
},
{
Expand Down Expand Up @@ -514,11 +543,11 @@
"## Summary\n",
"\n",
"Congratulations! You now have the basic concepts needed to create a data preprocessing pipeline with Temporian:\n",
"- Defining an **EventSet** and using **operators** on it.\n",
"- Combine **features** using **select** and **glue**.\n",
"- Coverting data back and forth between Temporian's **EventSet** and pandas **DataFrames**.\n",
"- Visualizing input/output data using **EventSet.plot()**.\n",
"- Operating and plotting with an **index**.\n",
"- Defining an `EventSet` and using **operators** on it.\n",
"- Combine features using `select` and `glue`.\n",
"- Converting data back and forth between Temporian's `EventSet` and pandas `DataFrames`.\n",
"- Visualizing input/output data using `EventSet.plot()`.\n",
"- Operating and plotting with `indexes`.\n",
"\n",
"### Other important details\n",
"\n",
Expand All @@ -536,6 +565,12 @@
"- We could only cover a small fraction of **[all available operators](https://temporian.readthedocs.io/en/stable/reference/temporian/operators/add_index/)**.\n",
"- We put a lot of ❤️ in the **[User Guide](https://temporian.readthedocs.io/en/stable/user_guide/)**, so make sure to check it out 🙂."
]
},
{
"cell_type": "markdown",
"id": "cebffed7",
"metadata": {},
"source": []
}
],
"metadata": {
Expand Down
10 changes: 5 additions & 5 deletions docs/src/user_guide.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
"source": [
"# User Guide\n",
"\n",
"This is a complete tour of Temporian's capabilities. For a quick hands-on overview, make sure to check the [Getting started guide](./getting_started)."
"This is a complete tour of Temporian's capabilities. For a quick hands-on overview, make sure to check the [Getting started guide](https://temporian.readthedocs.io/en/stable/getting_started)."
]
},
{
Expand Down Expand Up @@ -187,7 +187,7 @@
"result = evset.simple_moving_average(window_length=1)\n",
"\n",
"# Plot the results\n",
"tp.plot([evset, result]) "
"tp.plot([evset, result])"
]
},
{
Expand Down Expand Up @@ -1533,7 +1533,7 @@
" return sma_2 - sma_4\n",
"\n",
"result = my_function(evset)\n",
" \n",
"\n",
"result.plot()"
]
},
Expand All @@ -1557,7 +1557,7 @@
"sma_4_node = f1_node.simple_moving_average(window_length=4)\n",
"result_node = sma_2_node - sma_4_node\n",
"result = tp.run(result_node, {input_node: evset}, verbose=1)\n",
" \n",
"\n",
"result.plot()"
]
},
Expand Down Expand Up @@ -1801,7 +1801,7 @@
" features=a_evset.schema.features,\n",
" indexes=a_evset.schema.indexes\n",
")\n",
" \n",
"\n",
"a_node = a_evset.node()"
]
},
Expand Down

0 comments on commit 3d235c8

Please sign in to comment.