-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Hi,
I have a doubt related to the linear models. I have 16 features in my dataset when fitting the M5Prime model.
modelM5 = M5Prime(max_depth=4)
modelM5.fit(x_train.values, y_train.values)
After fitting, this is the final part of the output of export_text_m5:
LM1: 2.165e-02 * X[0] + 2.675e-05 * X[1] - 9.072e-06 * X[2] - 1.355e-04 * X[3] + 7.039e-05 * X[4] + 3.113e-02 * X[5] + 2.730e-01
LM2: 1.985e-02 * X[0] + 2.820e-05 * X[1] - 9.072e-06 * X[2] - 1.355e-04 * X[3] + 7.039e-05 * X[4] + 3.113e-02 * X[5] + 2.764e-01
LM3: 1.985e-02 * X[0] + 2.931e-05 * X[1] - 9.072e-06 * X[2] - 1.355e-04 * X[3] + 7.039e-05 * X[4] + 3.113e-02 * X[5] + 2.776e-01
LM4: 3.175e-02 * X[0] + 1.754e-05 * X[1] - 9.072e-06 * X[2] - 3.401e-04 * X[3] + 7.039e-05 * X[4] + 6.773e-02 * X[5] + 4.613e-01
LM5: 3.175e-02 * X[0] + 1.754e-05 * X[1] - 9.072e-06 * X[2] - 3.265e-04 * X[3] + 7.039e-05 * X[4] + 6.773e-02 * X[5] + 4.393e-01
LM6: 2.128e-02 * X[0] + 1.754e-05 * X[1] - 9.072e-06 * X[2] - 1.631e-04 * X[3] + 7.039e-05 * X[4] + 2.011e-01 * X[5] + 6.056e-01
LM7: 3.445e-02 * X[0] + 6.073e-05 * X[1] - 6.333e-04 * X[2] - 5.438e-05 * X[3] + 2.951e-04 * X[4] + 8.026e-02 * X[5] + 4.717e-01
LM8: 3.445e-02 * X[0] + 6.073e-05 * X[1] - 2.718e-04 * X[2] - 5.438e-05 * X[3] + 6.259e-05 * X[4] + 3.223e-02 * X[5] + 3.194e-01
LM9: 3.445e-02 * X[0] + 6.073e-05 * X[1] - 2.718e-04 * X[2] - 5.438e-05 * X[3] + 6.979e-05 * X[4] + 3.223e-02 * X[5] + 3.191e-01
LM10: 1.050e-02 * X[0] + 1.643e-04 * X[1] - 1.301e-05 * X[2] - 5.438e-05 * X[3] + 1.801e-05 * X[4] + 6.602e-02 * X[5] + 2.598e-01
LM11: 1.050e-02 * X[0] + 2.685e-05 * X[1] - 1.301e-05 * X[2] - 5.438e-05 * X[3] + 1.801e-05 * X[4] + 2.772e-02 * X[5] + 4.189e-01
LM12: 1.050e-02 * X[0] + 2.685e-05 * X[1] - 1.301e-05 * X[2] - 5.438e-05 * X[3] + 1.801e-05 * X[4] + 1.761e-02 * X[5] + 4.762e-01
However, when I look at the features variable of each node_model:
[x.features for x in modelM5.node_models]
I get somethink like:
[[0, 4, 8, 10, 12, 15],
[0, 4, 8, 10, 12, 15] ,
[0, 4, 8, 10, 12, 15],
(...)
]
Doesn't this mean that the variables being used for the linear model are different than the ones being printed in the export_text_m5 method? Is X[0] really the first feature from my inputs? Or from the features that the linear model used?
Right now the predictions are working fine but they don't add up with explicitly using the linear model coefficient by hand.
Is there some kind of issue here of something wrong from my side?
Thank you in advance.