How to compute features that depend on forecasted values? #237

davidrpugh · 2023-10-09T05:41:58Z

davidrpugh
Oct 9, 2023

Thanks again for making a great suite of forecasting libraries! I am continuing to work my way through the Kaggle Store Sales Time Series Forecasting competition to teach myself how to use your libraries.

There are a number of features that I have created whose values during the forecasting window depend on the forecasted values of the target. Simple example would be any feature that depends on the historical values of total sales for a particular store.

total_sales = (
    train_df.loc[:, ["sales"]]
                .groupby(["store_nbr", "date"])
                .sum()

Suppose that I created a feature which was a rolling 7 day average of total sales. When I go to make forecasts recursively, then I need to compute the value of this features after making my predictions for each period. Do I use the after_predict callback for this?

jmoralez · 2023-10-09T16:27:59Z

jmoralez
Oct 9, 2023
Maintainer

Hey. We currently don't support that, we have #194 to implement it in a nice way. However you can achieve it at the moment by doing some manual work. I believe the following does what you want:

from mlforecast import MLForecast
from mlforecast.utils import generate_series
from sklearn.linear_model import LinearRegression
from window_ops.rolling import rolling_mean

# note the equal_ends here. what we're going to do here is just add up the features by serie,
# so they should all be on the same timestamp, otherwise you'll get wrong results
series = generate_series(100, n_static_features=1, equal_ends=True)

fcst = MLForecast(
    models=LinearRegression(),
    freq='D',
    lag_transforms={
        1: [(rolling_mean, 7)]
    }
)
# this computes the training set with the rolling mean of lag1 per serie
prep = fcst.preprocess(series, static_features=['static_0'])
# now you can add your features here as an aggregation of the individual ones
prep['static_0_rolling_mean_lag1_window_size7'] = prep.groupby(['static_0', 'ds'])['rolling_mean_lag1_window_size7'].transform('sum')

# define X, y and train the models
X = prep.drop(columns=['unique_id', 'ds', 'y'])
y = prep['y']
fcst.fit_models(X, y)

# define the transformation for the predict step, note that this doesn't use the date
# because we're assuming they're all on the same date
def update_rolling_mean_static_0(df):
    df['static_0_rolling_mean_lag1_window_size7'] = df.groupby('static_0')['rolling_mean_lag1_window_size7'].transform('sum')
    return df

# provide this function to the before_predict_callback
fcst.predict(1, before_predict_callback=update_rolling_mean_static0)

Please let us know if you have further doubts.

1 reply

davidrpugh Oct 10, 2023
Author

Thanks for letting me know that it doesn't exist (yet) and for the advice on how to get started. The above will work nicely for my purposes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to compute features that depend on forecasted values? #237

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

How to compute features that depend on forecasted values? #237

davidrpugh Oct 9, 2023

Replies: 1 comment · 1 reply

jmoralez Oct 9, 2023 Maintainer

davidrpugh Oct 10, 2023 Author

davidrpugh
Oct 9, 2023

Replies: 1 comment 1 reply

jmoralez
Oct 9, 2023
Maintainer

davidrpugh Oct 10, 2023
Author