From the course: Machine Learning with Logistic Regression in Excel, R, and Power BI

Unlock the full course today

Join today to access over 24,800 courses taught by industry experts.

Utilizing training and testing data sets

Utilizing training and testing data sets

- [Professor] One of the most important facets of running machine learning models like logistic regression is splitting the entire dataset into two pieces. The training and the testing datasets. Typically 80% of the data is training data. And the remaining 20% is testing. Training data sets, let us build models by solving for the model parameters to fit. However, we're not running our final model on the training dataset, the testing data set is what we use to validate the model, in order to best model data in the real world. We want to hide this testing data set away until we're ready to test the model out with it. If we fitted our model perfectly on our training data, all the data points would be mapped perfectly to their actual outcomes. This means if we ran the model on our training data set, the total error loss would be zero because the model would perfectly fit every point. However, algorithms deal with ambiguity.…

Contents