-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Description
The monthly forecast results change dramatically when the training data has not changed but only shifted by 1 year.
I wanted to predict the next 12 months of a monthly time series from a 48-months history.
The 'ds' column contains the 48 months between '2015-01-01' and '2018-12-01'.
Although shifting the 'ds' column by 1 year (to contain the 48 months between '2016-01-01' and '2019-12-01') does not change the 48 numeric values of the 'y' column, it unexpectedly changes the forecast results of the next 12 points by an extreme magnitude.
Here is a table results to sum up the observation :
| Change in the 'ds' column | Change in the 'y' column | Change in the forecast |
|---|---|---|
| Shift by 1 year | None | -105% for the 5th point (from positive to negative), +76% for the 7th point |
Steps to reproduce the bug :
Python 3.10.12
dependencies =
prophet==1.1.7
Run the following Python code to reproduce the results (with the input file Series_6-decimals.csv) :
import pandas as pd
from prophet import Prophet
import os
filename = "Series_6-decimals.csv"
params = {
'growth': 'linear',
'seasonality_mode': 'multiplicative',
'changepoint_prior_scale': 0.05,
'seasonality_prior_scale': 10.0,
'weekly_seasonality': False,
'daily_seasonality': False,
'yearly_seasonality': 'auto',
}
df = pd.read_csv(os.path.join(os.path.dirname(__file__), filename), sep=';')
df_2020 = pd.read_csv(os.path.join(os.path.dirname(__file__), filename), sep=';')
df_2020['ds'] = df['ds'].replace({'^2015': '2016', '^2016': '2017', '^2017': '2018', '^2018': '2019'}, regex=True)
m = Prophet(**params)
m.fit(df, seed=1000)
future = m.make_future_dataframe(periods = 12, freq='MS', include_history=False)
forecast = m.predict(future)
y = forecast[['yhat']].values
print("sum of 'y' forecast:", y.sum())
print("5th point of 'y' forecast:", y[4]) # 1.54745855e+08
print("7th point of 'y' forecast:", y[6]) # 3.90564946e+08
m = Prophet(**params)
m.fit(df_2020, seed=1000)
future = m.make_future_dataframe(periods = 12, freq='MS', include_history=False)
forecast = m.predict(future)
y_2020 = forecast[['yhat']].values
print("sum of 'y_2020' forecast:", y_2020.sum())
print("5th point of 'y_2020' forecast:", y_2020[4]) # -7729934.43976249 --> FIXME : -105% compared to the origine y[4] (from positive to negative) ?!
print("7th point of 'y_2020' forecast:", y_2020[6]) # 6.86875344e+08 --> FIXME : +76% compared to the original y[6] ?!
print("relative difference in prediction of 5th point between y_2020 and y in %:", (y_2020[4].sum()-y[4].sum())/y[4].sum()*100, "%") # -105%
print("relative difference in prediction of 7th point between y_2020 and y in %:", (y_2020[6].sum()-y[6].sum())/y[6].sum()*100, "%") # +75.86%