[FEAT] - Tutorials update (Marco) (#311)

Nixtla · Apr 29, 2024 · 8fee338 · 8fee338
1 parent 24ccc22
commit 8fee338
Show file tree

Hide file tree

Showing 4 changed files with 277 additions and 38 deletions.
diff --git a/nbs/docs/tutorials/0_anomaly_detection.ipynb b/nbs/docs/tutorials/0_anomaly_detection.ipynb
diff --git a/nbs/docs/tutorials/12_longhorizon.ipynb b/nbs/docs/tutorials/12_longhorizon.ipynb
@@ -115,6 +115,13 @@
     "nixtla_client = NixtlaClient()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Load the data"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -231,6 +238,13 @@
     "input_seq = Y_df[-1104:-96]   # Gets a sequence of 1008 observations (1008 = 42 days * 24h/day)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Forecasting"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -288,6 +302,13 @@
     "nixtla_client.plot(Y_df[-168:], fcst_df, models=['TimeGPT'], level=[90], time_col='ds', target_col='y')"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Evaluation"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},

diff --git a/nbs/docs/tutorials/6_multiple_series.ipynb b/nbs/docs/tutorials/6_multiple_series.ipynb
@@ -13,7 +13,11 @@
    "id": "752a293c-d477-45e7-93d9-23fc15a23c8f",
    "metadata": {},
    "source": [
-    "TimeGPT provides a robust solution for multi-series forecasting, which involves analyzing multiple data series concurrently, rather than a single one. The tool can be fine-tuned using a broad collection of series, enabling you to tailor the model to suit your specific needs or tasks."
+    "TimeGPT provides a robust solution for multi-series forecasting, which involves analyzing multiple data series concurrently, rather than a single one. The tool can be fine-tuned using a broad collection of series, enabling you to tailor the model to suit your specific needs or tasks.\n",
+    "\n",
+    "Note that the forecasts are still univariate. This means that although TimeGPT is a global model, it won't consider the inter-feature relationships within the target series. However, TimeGPT does support the use of exogenous variables such as categorical variables (e.g., category, brand), numerical variables (e.g., temperature, prices), or even special holidays.\n",
+    "\n",
+    "Let's see this in action."
    ]
   },
   {
@@ -84,6 +88,14 @@
     "load_dotenv()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "61e6a645",
+   "metadata": {},
+   "source": [
+    "As always, we start off by intializing an instance of `NixtlaClient`."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -119,12 +131,24 @@
     "nixtla_client = NixtlaClient()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "4c1519c9",
+   "metadata": {},
+   "source": [
+    "## Load the data"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "2bd0934b-8b12-4c33-be3c-6b8d2bf86f54",
    "metadata": {},
    "source": [
-    "The following dataset contains prices of different electricity markets. Let see how can we forecast them. The main argument of the forecast method is the input data frame with the historical values of the time series you want to forecast. This data frame can contain information from many time series. Use the `unique_id` column to identify the different time series of your dataset."
+    "The following dataset contains prices of different electricity markets in Europe. \n",
+    "\n",
+    "Mutliple series are automatically detected in TimeGPT using the `unique_id` column. This column contains labels for each series. If there are multiple unique values in that column, then it knows it is handling a multi-series scneario.\n",
+    "\n",
+    "In this particular case, the `unique_id` column contains the value BE, DE, FR, JPM, and NP."
    ]
   },
   {
@@ -243,12 +267,20 @@
     "nixtla_client.plot(df)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "51d11ba4",
+   "metadata": {},
+   "source": [
+    "## Forecasting"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "1dbe558a-ac0f-475b-abd6-838121863307",
    "metadata": {},
    "source": [
-    "We just have to pass the dataframe to create forecasts for all the time series at once. "
+    "To forecast all series at once, we simply pass the dataframe to the `df` argument. TimeGPt will automatically forecast all series."
    ]
   },
   {
@@ -401,20 +433,30 @@
     "nixtla_client.plot(df, timegpt_fcst_multiseries_df, max_insample_length=365, level=[80, 90])"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "bd689e11",
+   "metadata": {},
+   "source": [
+    "From the figure above, we can see that the model effectively generated predictions for each unique series in the dataset."
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "32b60af1-fa48-4de8-bcee-73aff1e4e709",
    "metadata": {},
    "source": [
-    "#### Historical forecast"
+    "## Historical forecast"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "2a790ca0-b995-4c1a-a0e4-4e5c8b8df9eb",
    "metadata": {},
    "source": [
-    "You can also compute prediction intervals for historical forecasts adding the `add_history=True` parameter as follows:"
+    "You can also compute prediction intervals for historical forecasts adding the `add_history=True`.\n",
+    "\n",
+    "To specify the confidence interval, we use the `level` argument. Here, we pass the list `[80, 90]`. This will compute a 80% and 90% confidence interval."
    ]
   },
   {
@@ -571,6 +613,20 @@
     "    level=[80, 90],\n",
     ")"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a48df7da",
+   "metadata": {},
+   "source": [
+    "In the figure above, we now see the historical predictions made by TimeGPT for each series, along with the 80% and 90% confidence intervals."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c2adb0d1",
+   "metadata": {},
+   "source": []
   }
  ],
  "metadata": {

diff --git a/nbs/docs/tutorials/9_cross_validation.ipynb b/nbs/docs/tutorials/9_cross_validation.ipynb
@@ -5,7 +5,7 @@
    "id": "6de758ee-a0d2-4b3f-acff-eed419dd17c5",
    "metadata": {},
    "source": [
-    "# Cross Validation"
+    "# Cross-validation"
    ]
   },
   {
@@ -63,6 +63,14 @@
     "load_dotenv()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "ca8110a6",
+   "metadata": {},
+   "source": [
+    "We start off by initializing an instance of `NixtlaClient`."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -98,12 +106,22 @@
     "nixtla_client = NixtlaClient()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "fd57a883",
+   "metadata": {},
+   "source": [
+    "## Launching cross-validation"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "937ccb60-8a1b-4a58-9111-d9fb9d8d727c",
    "metadata": {},
    "source": [
-    "The `cross_validation` method within the `TimeGPT` class is an advanced functionality crafted to perform systematic validation on time series forecasting models. This method necessitates a dataframe comprising time-ordered data and employs a rolling-window scheme to meticulously evaluate the model's performance across different time periods, thereby ensuring the model's reliability and stability over time. \n",
+    "The `cross_validation` method within the `TimeGPT` class is an advanced functionality crafted to perform systematic validation on time series forecasting models. This method necessitates a dataframe comprising time-ordered data and employs a rolling-window scheme to meticulously evaluate the model's performance across different time periods, thereby ensuring the model's reliability and stability over time. The animation below shows how TimeGPT performs cross-validation.\n",
+    "\n",
+    "![](https://raw.githubusercontent.com/Nixtla/statsforecast/main/nbs/imgs/ChainedWindows.gif) \n",
     "\n",
     "Key parameters include `freq`, which denotes the data's frequency and is automatically inferred if not specified. The `id_col`, `time_col`, and `target_col` parameters designate the respective columns for each series' identifier, time step, and target values. The method offers customization through parameters like `n_windows`, indicating the number of separate time windows on which the model is assessed, and `step_size`, determining the gap between these windows. If `step_size` is unspecified, it defaults to the forecast horizon `h`. \n",
     "\n",
@@ -120,6 +138,7 @@
    "outputs": [],
    "source": [
     "pm_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/peyton_manning.csv')\n",
+    "\n",
     "timegpt_cv_df = nixtla_client.cross_validation(\n",
     "    pm_df, \n",
     "    h=7, \n",
@@ -159,12 +178,20 @@
     "    display(fig)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "be475644",
+   "metadata": {},
+   "source": [
+    "## Cross-validation with prediction intervals"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "c84e9a89-8de1-462f-a8d8-e45347031d23",
    "metadata": {},
    "source": [
-    "To asses the performance of `TimeGPT` with distributional forecasts, you can produce prediction intervals using the `level` argument."
+    "It is also possible to generate prediction intervals during cross-validation. To do so, we simply use the `level` argument."
    ]
   },
   {
@@ -206,12 +233,28 @@
     "    display(fig)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "72b8f68b",
+   "metadata": {},
+   "source": [
+    "## Cross-validation with exogenous variables"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c27f048",
+   "metadata": {},
+   "source": [
+    "### Time features"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "84388bb9-54c3-408e-bae2-46e39ffc3ee5",
    "metadata": {},
    "source": [
-    "You can also include `date_features` to see their impact in forecasting accuracy:"
+    "It is possible to include exogenous variables when performing cross-validation. Here we use the `date_features` parameter to create labels for each month. These features are then used by the model to make predictions during cross-validation."
    ]
   },
   {
@@ -256,18 +299,18 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b2cc956f-2a98-46be-922f-5fec1252c4e8",
+   "id": "4ca2ffe2",
    "metadata": {},
    "source": [
-    "#### Exogenous variables"
+    "### Dynamic features"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "a95ea323-cd6d-43cb-aed1-f10cf23c5a61",
    "metadata": {},
    "source": [
-    "Additionally you can pass exogenous variables to better inform `TimeGPT` about the data. You just simply have to add the exogenous regressors after the target column."
+    "Additionally you can pass dynamic exogenous variables to better inform `TimeGPT` about the data. You just simply have to add the exogenous regressors after the target column."
    ]
   },
   {
@@ -330,9 +373,9 @@
    "id": "77c8c469-bbb5-45ef-bd49-07bfdbc51b6b",
    "metadata": {},
    "source": [
-    "#### Compare different models\n",
+    "## Cross-validation with different TimeGPT instances\n",
     "\n",
-    "Also, you can generate cross validation for different instances of `TimeGPT` using the `model` argument."
+    "Also, you can generate cross validation for different instances of `TimeGPT` using the `model` argument. Here we use the base model and the model for long-horizon forecasting."
    ]
   },
   {