feat: add prophet experiment

Nixtla · Jan 4, 2024 · f5d175f · f5d175f
1 parent adef2de
commit f5d175f
Show file tree

Hide file tree

Showing 9 changed files with 722 additions and 0 deletions.
diff --git a/experiments/prophet/Makefile b/experiments/prophet/Makefile
@@ -0,0 +1,21 @@
+SRC_DIR := data
+EXCLUDE_STRINGS := catalogue
+TS_FILES := $(filter-out $(wildcard $(SRC_DIR)/*$(foreach str,$(EXCLUDE_STRINGS),*$(str)*)), $(wildcard $(SRC_DIR)/*.parquet))
+
+evaluate: .require-method
+	@echo "Evaluation for $${method}..."
+	@for file in $(TS_FILES); do \
+		echo $$file; \
+		python -m src.$${method}_exp --file $$file; \
+	done
+	@echo "Evaluation for $${method} complete."
+
+summarize_results:
+	@echo "Summarize results..."
+	@python -m src.results_summary --dir ./data/results/
+	@echo "Summarize results complete."
+
+.require-method:
+ifndef method
+	$(error method is required)
+endif
diff --git a/experiments/prophet/README.md b/experiments/prophet/README.md
@@ -0,0 +1,108 @@
+# TimeGPT vs Prophet: Time Series Forecasting Benchmark
+
+## Overview
+
+This repository offers a detailed benchmarking framework for comparing the performance of TimeGPT against Prophet and StatsForecast in time series forecasting. We provide datasets with over 300,000 series across various frequencies, including daily, weekly, 10-minute, and hourly intervals. Users can also incorporate their own datasets for a more personalized analysis. **TimeGPT was not trained on this datasets.**
+
+
+## Notes
+
+- Results were generated using a VM with 96 cores and 196 GB of RAM.
+- Prophet and StatsForecast was executed in paralell.
+- TimeGPT uses the AzureML endpoint.
+- Since the AzureML endpoint does not support GPU and scalable requests, the results can improve.
+
+## Repository Structure
+
+- `/data`: Parquet files with time series data.
+- `/src`: Source code for running benchmarks and experiments.
+- `/data/results`: Outputs and analysis from benchmark runs.
+
+## Data Structure
+
+Datasets should adhere to this structure:
+
+- **unique_id**: Identifier for each series.
+- **ds**: Timestamp of observation.
+- **y**: Target variable for forecasting.
+- **frequency**: Description of data frequency (e.g., 'Daily').
+- **pandas_frequency**: Pandas frequency string (e.g., 'D').
+- **h**: Forecasting horizon. (The last `h` periods of each series will be used as test.)
+- **seasonality**: Seasonality of the series (e.g., 7 for daily).
+
+## Running Experiments
+
+### Makefile
+
+The repository includes a Makefile to streamline the process of running experiments. The key commands are:
+
+1. **evaluate**: Runs the evaluation for a specified method (`timegpt`, `prophet`, or `statsforecast`).
+2. **summarize_results**: Summarizes the results from the evaluation.
+
+### Evaluation Flow
+
+1. **Run Evaluation**: Use `make evaluate method=<method_name>` where `<method_name>` is either `timegpt`, `prophet`, or `statsforecast`. The script filters out files containing specific strings (like 'catalogue') and runs the experiment for each `.parquet` file in the `/data` directory. The results will be written in `/data/results`. 
+
+2. **Summarize Results**: After running evaluations for each method, execute `make summarize_results` to aggregate and summarize the results, which will be written in this `README.md` file.
+
+## Getting Started
+
+1. **Prepare Data**: Ensure your Parquet files are in `/data`.
+2. **Create conda environment**: Run `conda env create -f environment.yml` and activate the environment using `conda activate timegpt-benchmark`.
+3. **Run Benchmarks**: Use the Makefile commands to run evaluations and summarize results.
+
+
+## Results
+<This section is automatically generated by results_summary.py>
+
+### Data Description
+
+| file           | frequency   |   n_series |      mean |        std |   min_length |   max_length |       n_obs |
+|:---------------|:------------|-----------:|----------:|-----------:|-------------:|-------------:|------------:|
+| 10Minutely_10T | 10Minutely  |         17 |     2.919 |      6.095 |        3,000 |        3,000 |      51,000 |
+| 30Minutely_30T | 30Minutely  |        556 |     0.233 |      0.329 |        3,000 |        3,000 |   1,668,000 |
+| Daily_D        | Daily       |    103,529 |   178.763 |  5,825.784 |           14 |        3,000 | 251,217,667 |
+| Hourly_H       | Hourly      |        227 |   635.332 |  4,425.693 |          748 |        3,000 |     590,653 |
+| Minutely_T     | Minutely    |         34 |    44.612 |    106.121 |        3,000 |        3,000 |     102,000 |
+| Monthly_MS     | Monthly     |     97,588 | 4,280.461 | 72,939.696 |           24 |        1,456 |   9,116,399 |
+| Quarterly_QS   | Quarterly   |      2,539 | 4,722.366 |  9,521.907 |           18 |          745 |     253,160 |
+| Weekly_W-MON   | Weekly      |     98,144 | 1,388.030 | 99,852.095 |            9 |          399 |  35,096,195 |
+
+### Performance
+
+
+| file           | metric   |   TimeGPT |   Prophet |   SeasonalNaive |
+|:---------------|:---------|----------:|----------:|----------------:|
+| 10Minutely_10T | mae      | **0.976** |     2.758 |             1.0 |
+| 10Minutely_10T | rmse     | **0.764** |     2.005 |             1.0 |
+| 10Minutely_10T | time     | **0.147** |     0.565 |             1.0 |
+|----------------|----------|-----------|-----------|-----------------|
+| 30Minutely_30T | mae      |   **0.6** |     0.661 |             1.0 |
+| 30Minutely_30T | rmse     | **0.652** |     0.687 |             1.0 |
+| 30Minutely_30T | time     | **0.318** |     7.498 |             1.0 |
+|----------------|----------|-----------|-----------|-----------------|
+| Daily_D        | mae      | **0.802** |     1.699 |             1.0 |
+| Daily_D        | rmse     |  **0.78** |     1.479 |             1.0 |
+| Daily_D        | time     | **0.544** |    48.019 |             1.0 |
+|----------------|----------|-----------|-----------|-----------------|
+| Hourly_H       | mae      | **0.855** |     1.124 |             1.0 |
+| Hourly_H       | rmse     | **0.881** |     1.048 |             1.0 |
+| Hourly_H       | time     | **0.134** |     3.426 |             1.0 |
+|----------------|----------|-----------|-----------|-----------------|
+| Minutely_T     | mae      | **0.732** |     1.349 |             1.0 |
+| Minutely_T     | rmse     | **0.705** |     1.207 |             1.0 |
+| Minutely_T     | time     | **0.088** |     0.786 |             1.0 |
+|----------------|----------|-----------|-----------|-----------------|
+| Monthly_MS     | mae      | **0.728** |      1.41 |             1.0 |
+| Monthly_MS     | rmse     | **0.686** |     1.196 |             1.0 |
+| Monthly_MS     | time     |      7.02 |   118.178 |         **1.0** |
+|----------------|----------|-----------|-----------|-----------------|
+| Quarterly_QS   | mae      | **0.966** |     1.384 |             1.0 |
+| Quarterly_QS   | rmse     | **0.974** |     1.313 |             1.0 |
+| Quarterly_QS   | time     |     1.218 |    18.685 |         **1.0** |
+|----------------|----------|-----------|-----------|-----------------|
+| Weekly_W-MON   | mae      | **0.878** |     2.748 |             1.0 |
+| Weekly_W-MON   | rmse     | **0.878** |     2.748 |             1.0 |
+| Weekly_W-MON   | time     |    12.489 |    85.611 |         **1.0** |
+|----------------|----------|-----------|-----------|-----------------|
+<end>
diff --git a/experiments/prophet/environment.yml b/experiments/prophet/environment.yml
@@ -0,0 +1,16 @@
+name: timegpt-benchmark
+channels:
+  - conda-forge
+dependencies:
+  - jupyterlab
+  - prophet
+  - pyspark>=3.3
+  - python=3.10
+  - pip:
+    - fire
+    - nixtlats
+    - python-dotenv
+    - statsforecast
+    - utilsforecast
+    - tabulate
+
diff --git a/experiments/prophet/src/prophet_exp.py b/experiments/prophet/src/prophet_exp.py
@@ -0,0 +1,187 @@
+from concurrent.futures import ThreadPoolExecutor
+from copy import deepcopy
+from time import time
+from typing import Optional
+
+import fire
+import numpy as np
+import pandas as pd
+from prophet import Prophet as _Prophet
+from utilsforecast.processing import (
+    backtest_splits,
+    drop_index_if_pandas,
+    join,
+    maybe_compute_sort_indices,
+    take_rows,
+    vertical_concat,
+)
+
+from src.tools import ExperimentHandler
+
+
+class ParallelForecaster:
+    def _process_group(self, func, df, **kwargs):
+        uid = df["unique_id"].iloc[0]
+        _df = df.drop("unique_id", axis=1)
+        res_df = func(_df, **kwargs)
+        res_df.insert(0, "unique_id", uid)
+        return res_df
+
+    def _apply_parallel(self, df_grouped, func, **kwargs):
+        results = []
+        with ThreadPoolExecutor(max_workers=None) as executor:
+            futures = [
+                executor.submit(self._process_group, func, df, **kwargs)
+                for _, df in df_grouped
+            ]
+            for future in futures:
+                results.append(future.result())
+        return pd.concat(results)
+
+    def forecast(
+        self,
+        df: pd.DataFrame,
+        h: int,
+        X_df: Optional[pd.DataFrame] = None,
+    ):
+        df_grouped = df.groupby("unique_id")
+        return self._apply_parallel(
+            df_grouped,
+            self._local_forecast,
+            h=h,
+        )
+
+    def cross_validation(
+        self,
+        df: pd.DataFrame,
+        h: int,
+        n_windows: int = 1,
+        step_size: Optional[int] = None,
+        **kwargs,
+    ):
+        df_grouped = df.groupby("unique_id")
+        kwargs = {"h": h, "n_windows": n_windows, "step_size": step_size, **kwargs}
+        return self._apply_parallel(
+            df_grouped,
+            self._local_cross_validation,
+            **kwargs,
+        )
+
+
+class Prophet(_Prophet, ParallelForecaster):
+    def __init__(
+        self,
+        freq: str,
+        alias: str = "Prophet",
+        *args,
+        **kwargs,
+    ):
+        super().__init__(*args, **kwargs)
+        self.freq = freq
+        self.alias = alias
+
+    def _local_forecast(
+        self,
+        df: pd.DataFrame,
+        h: int,
+        X_df: Optional[pd.DataFrame] = None,
+    ) -> pd.DataFrame:
+        model = deepcopy(self)
+        model.fit(df=df)
+        future_df = model.make_future_dataframe(
+            periods=h, include_history=False, freq=self.freq
+        )
+        if X_df is not None:
+            future_df = future_df.merge(X_df, how="left")
+        np.random.seed(1000)
+        fcst_df = model.predict(future_df)
+        fcst_df = fcst_df.rename({"yhat": self.alias}, axis=1)
+        fcst_df = fcst_df[["ds", self.alias]]
+        return fcst_df
+
+    def _local_cross_validation(
+        self,
+        df: pd.DataFrame,
+        h: int,
+        n_windows: int = 1,
+        step_size: Optional[int] = None,
+    ) -> pd.DataFrame:
+        df = df.copy()
+        df["ds"] = pd.to_datetime(df["ds"])
+        df.insert(0, "unique_id", "ts_0")
+        # mlforecast cv code
+        results = []
+        sort_idxs = maybe_compute_sort_indices(df, "unique_id", "ds")
+        if sort_idxs is not None:
+            df = take_rows(df, sort_idxs)
+        splits = backtest_splits(
+            df,
+            n_windows=n_windows,
+            h=h,
+            id_col="unique_id",
+            time_col="ds",
+            freq=pd.tseries.frequencies.to_offset(self.freq),
+            step_size=h if step_size is None else step_size,
+        )
+        for i_window, (cutoffs, train, valid) in enumerate(splits):
+            if len(valid.columns) > 3:
+                # if we have uid, ds, y + exogenous vars
+                train_future = valid.drop(columns="y")
+            else:
+                train_future = None
+            y_pred = self._local_forecast(
+                df=train[["ds", "y"]],
+                h=h,
+                X_df=train_future,
+            )
+            y_pred.insert(0, "unique_id", "ts_0")
+            y_pred = join(y_pred, cutoffs, on="unique_id", how="left")
+            result = join(
+                valid[["unique_id", "ds", "y"]],
+                y_pred,
+                on=["unique_id", "ds"],
+            )
+            if result.shape[0] < valid.shape[0]:
+                raise ValueError(
+                    "Cross validation result produced less results than expected. "
+                    "Please verify that the frequency parameter (freq) matches your series' "
+                    "and that there aren't any missing periods."
+                )
+            results.append(result)
+        out = vertical_concat(results)
+        out = drop_index_if_pandas(out)
+        first_out_cols = ["unique_id", "ds", "cutoff", "y"]
+        remaining_cols = [c for c in out.columns if c not in first_out_cols]
+        fcst_cv_df = out[first_out_cols + remaining_cols]
+        return fcst_cv_df.drop(columns="unique_id")
+
+
+def evaluate_experiment(file: str):
+    exp_handler = ExperimentHandler(file=file, method="prophet")
+    Y_df, freq, pandas_freq, h, seasonality = exp_handler.read_data()
+    model_name = "Prophet"
+    print(model_name)
+    prophet = Prophet(freq=pandas_freq)
+    start = time()
+    Y_hat_df = prophet.cross_validation(
+        df=Y_df,
+        h=h,
+        n_windows=1,
+    )
+    total_time = time() - start
+    print(total_time)
+    # evaluation
+    eval_df, total_time_df = exp_handler.evaluate_model(
+        Y_hat_df=Y_hat_df,
+        model_name=model_name,
+        total_time=total_time,
+    )
+    exp_handler.save_results(
+        freq=freq,
+        eval_df=eval_df,
+        total_time_df=total_time_df,
+    )
+
+
+if __name__ == "__main__":
+    fire.Fire(evaluate_experiment)