Skip to content

linear models features and export_text_m5 #19

@camsilva

Description

@camsilva

Hi,

I have a doubt related to the linear models. I have 16 features in my dataset when fitting the M5Prime model.

modelM5 = M5Prime(max_depth=4)
modelM5.fit(x_train.values, y_train.values)

After fitting, this is the final part of the output of export_text_m5:

LM1: 2.165e-02 * X[0] + 2.675e-05 * X[1] - 9.072e-06 * X[2] - 1.355e-04 * X[3] + 7.039e-05 * X[4] + 3.113e-02 * X[5] + 2.730e-01
LM2: 1.985e-02 * X[0] + 2.820e-05 * X[1] - 9.072e-06 * X[2] - 1.355e-04 * X[3] + 7.039e-05 * X[4] + 3.113e-02 * X[5] + 2.764e-01
LM3: 1.985e-02 * X[0] + 2.931e-05 * X[1] - 9.072e-06 * X[2] - 1.355e-04 * X[3] + 7.039e-05 * X[4] + 3.113e-02 * X[5] + 2.776e-01
LM4: 3.175e-02 * X[0] + 1.754e-05 * X[1] - 9.072e-06 * X[2] - 3.401e-04 * X[3] + 7.039e-05 * X[4] + 6.773e-02 * X[5] + 4.613e-01
LM5: 3.175e-02 * X[0] + 1.754e-05 * X[1] - 9.072e-06 * X[2] - 3.265e-04 * X[3] + 7.039e-05 * X[4] + 6.773e-02 * X[5] + 4.393e-01
LM6: 2.128e-02 * X[0] + 1.754e-05 * X[1] - 9.072e-06 * X[2] - 1.631e-04 * X[3] + 7.039e-05 * X[4] + 2.011e-01 * X[5] + 6.056e-01
LM7: 3.445e-02 * X[0] + 6.073e-05 * X[1] - 6.333e-04 * X[2] - 5.438e-05 * X[3] + 2.951e-04 * X[4] + 8.026e-02 * X[5] + 4.717e-01
LM8: 3.445e-02 * X[0] + 6.073e-05 * X[1] - 2.718e-04 * X[2] - 5.438e-05 * X[3] + 6.259e-05 * X[4] + 3.223e-02 * X[5] + 3.194e-01
LM9: 3.445e-02 * X[0] + 6.073e-05 * X[1] - 2.718e-04 * X[2] - 5.438e-05 * X[3] + 6.979e-05 * X[4] + 3.223e-02 * X[5] + 3.191e-01
LM10: 1.050e-02 * X[0] + 1.643e-04 * X[1] - 1.301e-05 * X[2] - 5.438e-05 * X[3] + 1.801e-05 * X[4] + 6.602e-02 * X[5] + 2.598e-01
LM11: 1.050e-02 * X[0] + 2.685e-05 * X[1] - 1.301e-05 * X[2] - 5.438e-05 * X[3] + 1.801e-05 * X[4] + 2.772e-02 * X[5] + 4.189e-01
LM12: 1.050e-02 * X[0] + 2.685e-05 * X[1] - 1.301e-05 * X[2] - 5.438e-05 * X[3] + 1.801e-05 * X[4] + 1.761e-02 * X[5] + 4.762e-01

However, when I look at the features variable of each node_model:

[x.features for x in modelM5.node_models]

I get somethink like:

[[0, 4, 8, 10, 12, 15],
[0, 4, 8, 10, 12, 15] ,
[0, 4, 8, 10, 12, 15],
(...)
]

Doesn't this mean that the variables being used for the linear model are different than the ones being printed in the export_text_m5 method? Is X[0] really the first feature from my inputs? Or from the features that the linear model used?
Right now the predictions are working fine but they don't add up with explicitly using the linear model coefficient by hand.
Is there some kind of issue here of something wrong from my side?

Thank you in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions