Skip to content

Dramatic change in predictions on same data shifted by 1 year #2677

@gboulmier

Description

@gboulmier

The monthly forecast results change dramatically when the training data has not changed but only shifted by 1 year.

I wanted to predict the next 12 months of a monthly time series from a 48-months history.
The 'ds' column contains the 48 months between '2015-01-01' and '2018-12-01'.

Although shifting the 'ds' column by 1 year (to contain the 48 months between '2016-01-01' and '2019-12-01') does not change the 48 numeric values of the 'y' column, it unexpectedly changes the forecast results of the next 12 points by an extreme magnitude.

Here is a table results to sum up the observation :

Change in the 'ds' column Change in the 'y' column Change in the forecast
Shift by 1 year None -105% for the 5th point (from positive to negative), +76% for the 7th point

Steps to reproduce the bug :
Python 3.10.12
dependencies =
prophet==1.1.7

Run the following Python code to reproduce the results (with the input file Series_6-decimals.csv) :

import pandas as pd
from prophet import Prophet
import os

filename = "Series_6-decimals.csv"

params = {
    'growth': 'linear',
    'seasonality_mode': 'multiplicative',
    'changepoint_prior_scale': 0.05,
    'seasonality_prior_scale': 10.0,
    'weekly_seasonality': False,
    'daily_seasonality': False,
    'yearly_seasonality': 'auto',
}


df = pd.read_csv(os.path.join(os.path.dirname(__file__), filename), sep=';')
df_2020 = pd.read_csv(os.path.join(os.path.dirname(__file__), filename), sep=';')
df_2020['ds'] = df['ds'].replace({'^2015': '2016', '^2016': '2017', '^2017': '2018', '^2018': '2019'}, regex=True)


m = Prophet(**params)
m.fit(df, seed=1000)
future = m.make_future_dataframe(periods = 12, freq='MS', include_history=False)
forecast = m.predict(future)
y = forecast[['yhat']].values
print("sum of 'y' forecast:", y.sum())
print("5th point of 'y' forecast:", y[4]) # 1.54745855e+08
print("7th point of 'y' forecast:", y[6]) # 3.90564946e+08


m = Prophet(**params)
m.fit(df_2020, seed=1000)
future = m.make_future_dataframe(periods = 12, freq='MS', include_history=False)
forecast = m.predict(future)
y_2020 = forecast[['yhat']].values
print("sum of 'y_2020' forecast:", y_2020.sum())
print("5th point of 'y_2020' forecast:", y_2020[4]) # -7729934.43976249 --> FIXME : -105% compared to the origine y[4] (from positive to negative) ?!
print("7th point of 'y_2020' forecast:", y_2020[6]) # 6.86875344e+08 --> FIXME : +76% compared to the original y[6] ?!


print("relative difference in prediction of 5th point between y_2020 and y in %:",  (y_2020[4].sum()-y[4].sum())/y[4].sum()*100, "%") # -105%
print("relative difference in prediction of 7th point between y_2020 and y in %:",  (y_2020[6].sum()-y[6].sum())/y[6].sum()*100, "%") # +75.86%

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions