Skip to content

Unexpected change in predictions on close to identical data #2676

@gboulmier

Description

@gboulmier

The monthly forecast results change by several tens of percents when the training data has changed by only 10^-11.

I wanted to predict the next 12 months of a monthly time series from a 48-months history.
The order of magnitude of the data is between 10^6 and 10^9, with 6 digits after the decimal point.

Although reducing the numerical data precision by 1 digit almost does not change the values of the data, it unexpectedly changes the forecast results by several tens of percents. The Prophet model is way too sensitive to the numerical data precision, there is no reason for such a behavior.

Here is a table results to sum up the observations :

Change in numerical data precision Resulting relative difference in the data Change in the forecast
From 6-decimals to 5-decimals at most 10^-12 upward change : +12% in average, up to 24%
From 5-decimals to 4-decimals at most 10^-11 downward change : -9% in average, up to -15%

Steps to reproduce the bug :
Python 3.10.12
dependencies =
prophet==1.1.7

Run the following Python code to reproduce the results (with the input file Series_6-decimals.csv) :

import pandas as pd
from prophet import Prophet
import os

filename = "Series_6-decimals.csv"

params = {
    'growth': 'linear',
    'seasonality_mode': 'multiplicative',
    'changepoint_prior_scale': 0.05,
    'seasonality_prior_scale': 10.0,
    'weekly_seasonality': False,
    'daily_seasonality': False,
    'yearly_seasonality': 'auto',
}


df = pd.read_csv(os.path.join(os.path.dirname(__file__), filename), sep=';')
df_5 = df.round(5)
df_4 = df.round(4)

print("maximum absolute difference between df_5 and df:", max(abs(df_5['y']-df['y']))) # maximum absolute difference
print("maximum absolute relative difference between df_5 and df:", max(abs((df_5['y']-df['y'])/df['y']))) # maximum absolute relative difference
print("maximum absolute difference between df_4 and df:", max(abs(df['y']-df_4['y']))) # maximum absolute difference
print("maximum absolute relative difference between df_4 and df:", max(abs((df_4['y']-df['y'])/df['y']))) # maximum absolute relative difference


m = Prophet(**params)
m.fit(df, seed=1000)
future = m.make_future_dataframe(periods = 12, freq='MS', include_history=False)
forecast = m.predict(future)
y = forecast[['yhat']].values
print("sum of 'y' forecast:", y.sum()) # 2062620171.0109873
print("1st point of 'y' forecast:", y[0]) # 61602096.36060803


m = Prophet(**params)
m.fit(df_5, seed=1000)
future = m.make_future_dataframe(periods = 12, freq='MS', include_history=False)
forecast = m.predict(future)
y_5 = forecast[['yhat']].values
print("sum of 'y_5' forecast:", y_5.sum()) # 2305764926.4040384 --> FIXME : +12% compared to the sum of the original forecast ?!
print("1st point of 'y_5' forecast:", y_5[0]) # 76086131.36506273 --> FIXME : +23% compared to the original y[0] ?!


m = Prophet(**params)
m.fit(df_4, seed=1000)
future = m.make_future_dataframe(periods = 12, freq='MS', include_history=False)
forecast = m.predict(future)
y_4 = forecast[['yhat']].values
print("sum of 'y_4' forecast: ", y_4.sum()) # 2096602967.4861932
print("1st point of 'y_4' forecast: ", y_4[0]) # 64983152.06722656


print("relative difference in sum of predictions between y_5 and y in %:", (y_5.sum()-y.sum())/y.sum()*100, "%") # +11.8%
print("relative difference in sum of predictions between y_4 and y_5 in %:", (y_4.sum()-y_5.sum())/y_5.sum()*100, "%") # -9.1%
print("relative difference in prediction of 1st point between y_5 and y in %:",  (y_5[0]-y[0])/y[0]*100, "%") # +23.5%
print("relative difference in prediction of 1st point between y_4 and y_5 in %:", (y_4[0]-y_5[0])/y_5[0]*100, "%") # -14.6%

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions