-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Description
The monthly forecast results change by several tens of percents when the training data has changed by only 10^-11.
I wanted to predict the next 12 months of a monthly time series from a 48-months history.
The order of magnitude of the data is between 10^6 and 10^9, with 6 digits after the decimal point.
Although reducing the numerical data precision by 1 digit almost does not change the values of the data, it unexpectedly changes the forecast results by several tens of percents. The Prophet model is way too sensitive to the numerical data precision, there is no reason for such a behavior.
Here is a table results to sum up the observations :
| Change in numerical data precision | Resulting relative difference in the data | Change in the forecast |
|---|---|---|
| From 6-decimals to 5-decimals | at most 10^-12 | upward change : +12% in average, up to 24% |
| From 5-decimals to 4-decimals | at most 10^-11 | downward change : -9% in average, up to -15% |
Steps to reproduce the bug :
Python 3.10.12
dependencies =
prophet==1.1.7
Run the following Python code to reproduce the results (with the input file Series_6-decimals.csv) :
import pandas as pd
from prophet import Prophet
import os
filename = "Series_6-decimals.csv"
params = {
'growth': 'linear',
'seasonality_mode': 'multiplicative',
'changepoint_prior_scale': 0.05,
'seasonality_prior_scale': 10.0,
'weekly_seasonality': False,
'daily_seasonality': False,
'yearly_seasonality': 'auto',
}
df = pd.read_csv(os.path.join(os.path.dirname(__file__), filename), sep=';')
df_5 = df.round(5)
df_4 = df.round(4)
print("maximum absolute difference between df_5 and df:", max(abs(df_5['y']-df['y']))) # maximum absolute difference
print("maximum absolute relative difference between df_5 and df:", max(abs((df_5['y']-df['y'])/df['y']))) # maximum absolute relative difference
print("maximum absolute difference between df_4 and df:", max(abs(df['y']-df_4['y']))) # maximum absolute difference
print("maximum absolute relative difference between df_4 and df:", max(abs((df_4['y']-df['y'])/df['y']))) # maximum absolute relative difference
m = Prophet(**params)
m.fit(df, seed=1000)
future = m.make_future_dataframe(periods = 12, freq='MS', include_history=False)
forecast = m.predict(future)
y = forecast[['yhat']].values
print("sum of 'y' forecast:", y.sum()) # 2062620171.0109873
print("1st point of 'y' forecast:", y[0]) # 61602096.36060803
m = Prophet(**params)
m.fit(df_5, seed=1000)
future = m.make_future_dataframe(periods = 12, freq='MS', include_history=False)
forecast = m.predict(future)
y_5 = forecast[['yhat']].values
print("sum of 'y_5' forecast:", y_5.sum()) # 2305764926.4040384 --> FIXME : +12% compared to the sum of the original forecast ?!
print("1st point of 'y_5' forecast:", y_5[0]) # 76086131.36506273 --> FIXME : +23% compared to the original y[0] ?!
m = Prophet(**params)
m.fit(df_4, seed=1000)
future = m.make_future_dataframe(periods = 12, freq='MS', include_history=False)
forecast = m.predict(future)
y_4 = forecast[['yhat']].values
print("sum of 'y_4' forecast: ", y_4.sum()) # 2096602967.4861932
print("1st point of 'y_4' forecast: ", y_4[0]) # 64983152.06722656
print("relative difference in sum of predictions between y_5 and y in %:", (y_5.sum()-y.sum())/y.sum()*100, "%") # +11.8%
print("relative difference in sum of predictions between y_4 and y_5 in %:", (y_4.sum()-y_5.sum())/y_5.sum()*100, "%") # -9.1%
print("relative difference in prediction of 1st point between y_5 and y in %:", (y_5[0]-y[0])/y[0]*100, "%") # +23.5%
print("relative difference in prediction of 1st point between y_4 and y_5 in %:", (y_4[0]-y_5[0])/y_5[0]*100, "%") # -14.6%