Skip to content

Commit

Permalink
feat: add prophet experiment
Browse files Browse the repository at this point in the history
  • Loading branch information
AzulGarza committed Jan 4, 2024
1 parent adef2de commit f5d175f
Show file tree
Hide file tree
Showing 9 changed files with 722 additions and 0 deletions.
21 changes: 21 additions & 0 deletions experiments/prophet/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
SRC_DIR := data
EXCLUDE_STRINGS := catalogue
TS_FILES := $(filter-out $(wildcard $(SRC_DIR)/*$(foreach str,$(EXCLUDE_STRINGS),*$(str)*)), $(wildcard $(SRC_DIR)/*.parquet))

evaluate: .require-method
@echo "Evaluation for $${method}..."
@for file in $(TS_FILES); do \
echo $$file; \
python -m src.$${method}_exp --file $$file; \
done
@echo "Evaluation for $${method} complete."

summarize_results:
@echo "Summarize results..."
@python -m src.results_summary --dir ./data/results/
@echo "Summarize results complete."

.require-method:
ifndef method
$(error method is required)
endif
108 changes: 108 additions & 0 deletions experiments/prophet/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# TimeGPT vs Prophet: Time Series Forecasting Benchmark

## Overview

This repository offers a detailed benchmarking framework for comparing the performance of TimeGPT against Prophet and StatsForecast in time series forecasting. We provide datasets with over 300,000 series across various frequencies, including daily, weekly, 10-minute, and hourly intervals. Users can also incorporate their own datasets for a more personalized analysis. **TimeGPT was not trained on this datasets.**


## Notes

- Results were generated using a VM with 96 cores and 196 GB of RAM.
- Prophet and StatsForecast was executed in paralell.
- TimeGPT uses the AzureML endpoint.
- Since the AzureML endpoint does not support GPU and scalable requests, the results can improve.

## Repository Structure

- `/data`: Parquet files with time series data.
- `/src`: Source code for running benchmarks and experiments.
- `/data/results`: Outputs and analysis from benchmark runs.

## Data Structure

Datasets should adhere to this structure:

- **unique_id**: Identifier for each series.
- **ds**: Timestamp of observation.
- **y**: Target variable for forecasting.
- **frequency**: Description of data frequency (e.g., 'Daily').
- **pandas_frequency**: Pandas frequency string (e.g., 'D').
- **h**: Forecasting horizon. (The last `h` periods of each series will be used as test.)
- **seasonality**: Seasonality of the series (e.g., 7 for daily).

## Running Experiments

### Makefile

The repository includes a Makefile to streamline the process of running experiments. The key commands are:

1. **evaluate**: Runs the evaluation for a specified method (`timegpt`, `prophet`, or `statsforecast`).
2. **summarize_results**: Summarizes the results from the evaluation.

### Evaluation Flow

1. **Run Evaluation**: Use `make evaluate method=<method_name>` where `<method_name>` is either `timegpt`, `prophet`, or `statsforecast`. The script filters out files containing specific strings (like 'catalogue') and runs the experiment for each `.parquet` file in the `/data` directory. The results will be written in `/data/results`.

2. **Summarize Results**: After running evaluations for each method, execute `make summarize_results` to aggregate and summarize the results, which will be written in this `README.md` file.

## Getting Started

1. **Prepare Data**: Ensure your Parquet files are in `/data`.
2. **Create conda environment**: Run `conda env create -f environment.yml` and activate the environment using `conda activate timegpt-benchmark`.
3. **Run Benchmarks**: Use the Makefile commands to run evaluations and summarize results.


## Results
<This section is automatically generated by results_summary.py>

### Data Description

| file | frequency | n_series | mean | std | min_length | max_length | n_obs |
|:---------------|:------------|-----------:|----------:|-----------:|-------------:|-------------:|------------:|
| 10Minutely_10T | 10Minutely | 17 | 2.919 | 6.095 | 3,000 | 3,000 | 51,000 |
| 30Minutely_30T | 30Minutely | 556 | 0.233 | 0.329 | 3,000 | 3,000 | 1,668,000 |
| Daily_D | Daily | 103,529 | 178.763 | 5,825.784 | 14 | 3,000 | 251,217,667 |
| Hourly_H | Hourly | 227 | 635.332 | 4,425.693 | 748 | 3,000 | 590,653 |
| Minutely_T | Minutely | 34 | 44.612 | 106.121 | 3,000 | 3,000 | 102,000 |
| Monthly_MS | Monthly | 97,588 | 4,280.461 | 72,939.696 | 24 | 1,456 | 9,116,399 |
| Quarterly_QS | Quarterly | 2,539 | 4,722.366 | 9,521.907 | 18 | 745 | 253,160 |
| Weekly_W-MON | Weekly | 98,144 | 1,388.030 | 99,852.095 | 9 | 399 | 35,096,195 |

### Performance


| file | metric | TimeGPT | Prophet | SeasonalNaive |
|:---------------|:---------|----------:|----------:|----------------:|
| 10Minutely_10T | mae | **0.976** | 2.758 | 1.0 |
| 10Minutely_10T | rmse | **0.764** | 2.005 | 1.0 |
| 10Minutely_10T | time | **0.147** | 0.565 | 1.0 |
|----------------|----------|-----------|-----------|-----------------|
| 30Minutely_30T | mae | **0.6** | 0.661 | 1.0 |
| 30Minutely_30T | rmse | **0.652** | 0.687 | 1.0 |
| 30Minutely_30T | time | **0.318** | 7.498 | 1.0 |
|----------------|----------|-----------|-----------|-----------------|
| Daily_D | mae | **0.802** | 1.699 | 1.0 |
| Daily_D | rmse | **0.78** | 1.479 | 1.0 |
| Daily_D | time | **0.544** | 48.019 | 1.0 |
|----------------|----------|-----------|-----------|-----------------|
| Hourly_H | mae | **0.855** | 1.124 | 1.0 |
| Hourly_H | rmse | **0.881** | 1.048 | 1.0 |
| Hourly_H | time | **0.134** | 3.426 | 1.0 |
|----------------|----------|-----------|-----------|-----------------|
| Minutely_T | mae | **0.732** | 1.349 | 1.0 |
| Minutely_T | rmse | **0.705** | 1.207 | 1.0 |
| Minutely_T | time | **0.088** | 0.786 | 1.0 |
|----------------|----------|-----------|-----------|-----------------|
| Monthly_MS | mae | **0.728** | 1.41 | 1.0 |
| Monthly_MS | rmse | **0.686** | 1.196 | 1.0 |
| Monthly_MS | time | 7.02 | 118.178 | **1.0** |
|----------------|----------|-----------|-----------|-----------------|
| Quarterly_QS | mae | **0.966** | 1.384 | 1.0 |
| Quarterly_QS | rmse | **0.974** | 1.313 | 1.0 |
| Quarterly_QS | time | 1.218 | 18.685 | **1.0** |
|----------------|----------|-----------|-----------|-----------------|
| Weekly_W-MON | mae | **0.878** | 2.748 | 1.0 |
| Weekly_W-MON | rmse | **0.878** | 2.748 | 1.0 |
| Weekly_W-MON | time | 12.489 | 85.611 | **1.0** |
|----------------|----------|-----------|-----------|-----------------|
<end>
16 changes: 16 additions & 0 deletions experiments/prophet/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
name: timegpt-benchmark
channels:
- conda-forge
dependencies:
- jupyterlab
- prophet
- pyspark>=3.3
- python=3.10
- pip:
- fire
- nixtlats
- python-dotenv
- statsforecast
- utilsforecast
- tabulate

187 changes: 187 additions & 0 deletions experiments/prophet/src/prophet_exp.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
from concurrent.futures import ThreadPoolExecutor
from copy import deepcopy
from time import time
from typing import Optional

import fire
import numpy as np
import pandas as pd
from prophet import Prophet as _Prophet
from utilsforecast.processing import (
backtest_splits,
drop_index_if_pandas,
join,
maybe_compute_sort_indices,
take_rows,
vertical_concat,
)

from src.tools import ExperimentHandler


class ParallelForecaster:
def _process_group(self, func, df, **kwargs):
uid = df["unique_id"].iloc[0]
_df = df.drop("unique_id", axis=1)
res_df = func(_df, **kwargs)
res_df.insert(0, "unique_id", uid)
return res_df

def _apply_parallel(self, df_grouped, func, **kwargs):
results = []
with ThreadPoolExecutor(max_workers=None) as executor:
futures = [
executor.submit(self._process_group, func, df, **kwargs)
for _, df in df_grouped
]
for future in futures:
results.append(future.result())
return pd.concat(results)

def forecast(
self,
df: pd.DataFrame,
h: int,
X_df: Optional[pd.DataFrame] = None,
):
df_grouped = df.groupby("unique_id")
return self._apply_parallel(
df_grouped,
self._local_forecast,
h=h,
)

def cross_validation(
self,
df: pd.DataFrame,
h: int,
n_windows: int = 1,
step_size: Optional[int] = None,
**kwargs,
):
df_grouped = df.groupby("unique_id")
kwargs = {"h": h, "n_windows": n_windows, "step_size": step_size, **kwargs}
return self._apply_parallel(
df_grouped,
self._local_cross_validation,
**kwargs,
)


class Prophet(_Prophet, ParallelForecaster):
def __init__(
self,
freq: str,
alias: str = "Prophet",
*args,
**kwargs,
):
super().__init__(*args, **kwargs)
self.freq = freq
self.alias = alias

def _local_forecast(
self,
df: pd.DataFrame,
h: int,
X_df: Optional[pd.DataFrame] = None,
) -> pd.DataFrame:
model = deepcopy(self)
model.fit(df=df)
future_df = model.make_future_dataframe(
periods=h, include_history=False, freq=self.freq
)
if X_df is not None:
future_df = future_df.merge(X_df, how="left")
np.random.seed(1000)
fcst_df = model.predict(future_df)
fcst_df = fcst_df.rename({"yhat": self.alias}, axis=1)
fcst_df = fcst_df[["ds", self.alias]]
return fcst_df

def _local_cross_validation(
self,
df: pd.DataFrame,
h: int,
n_windows: int = 1,
step_size: Optional[int] = None,
) -> pd.DataFrame:
df = df.copy()
df["ds"] = pd.to_datetime(df["ds"])
df.insert(0, "unique_id", "ts_0")
# mlforecast cv code
results = []
sort_idxs = maybe_compute_sort_indices(df, "unique_id", "ds")
if sort_idxs is not None:
df = take_rows(df, sort_idxs)
splits = backtest_splits(
df,
n_windows=n_windows,
h=h,
id_col="unique_id",
time_col="ds",
freq=pd.tseries.frequencies.to_offset(self.freq),
step_size=h if step_size is None else step_size,
)
for i_window, (cutoffs, train, valid) in enumerate(splits):
if len(valid.columns) > 3:
# if we have uid, ds, y + exogenous vars
train_future = valid.drop(columns="y")
else:
train_future = None
y_pred = self._local_forecast(
df=train[["ds", "y"]],
h=h,
X_df=train_future,
)
y_pred.insert(0, "unique_id", "ts_0")
y_pred = join(y_pred, cutoffs, on="unique_id", how="left")
result = join(
valid[["unique_id", "ds", "y"]],
y_pred,
on=["unique_id", "ds"],
)
if result.shape[0] < valid.shape[0]:
raise ValueError(
"Cross validation result produced less results than expected. "
"Please verify that the frequency parameter (freq) matches your series' "
"and that there aren't any missing periods."
)
results.append(result)
out = vertical_concat(results)
out = drop_index_if_pandas(out)
first_out_cols = ["unique_id", "ds", "cutoff", "y"]
remaining_cols = [c for c in out.columns if c not in first_out_cols]
fcst_cv_df = out[first_out_cols + remaining_cols]
return fcst_cv_df.drop(columns="unique_id")


def evaluate_experiment(file: str):
exp_handler = ExperimentHandler(file=file, method="prophet")
Y_df, freq, pandas_freq, h, seasonality = exp_handler.read_data()
model_name = "Prophet"
print(model_name)
prophet = Prophet(freq=pandas_freq)
start = time()
Y_hat_df = prophet.cross_validation(
df=Y_df,
h=h,
n_windows=1,
)
total_time = time() - start
print(total_time)
# evaluation
eval_df, total_time_df = exp_handler.evaluate_model(
Y_hat_df=Y_hat_df,
model_name=model_name,
total_time=total_time,
)
exp_handler.save_results(
freq=freq,
eval_df=eval_df,
total_time_df=total_time_df,
)


if __name__ == "__main__":
fire.Fire(evaluate_experiment)
Loading

0 comments on commit f5d175f

Please sign in to comment.