From the course: Machine Learning with SageMaker by Pearson

Unlock this course with a free trial

Join today to access over 25,200 courses taught by industry experts.

Integrating data processing and training steps

Integrating data processing and training steps - Amazon SageMaker Tutorial

From the course: Machine Learning with SageMaker by Pearson

Integrating data processing and training steps

We saw in the previous lesson that we can use a SageMaker pipeline to go from data set to machine learning model. And that is pulling the data set, creating a model, and deploying it to production. But what about that data processing that needs to occur on the data, such as, you know, dropping a column or imputing data whatever the case may be. So why should we integrate our data processing with our training? Well, we have some examples here, streamlining end-to-end workflow and reducing manual intervention, ensuring data consistency. If we have a repeatable process that our pipeline is following, then we can ensure that the changes that are made to create this particular version of the model are integrated into the next version of this model, improving efficiency and reproducibility of our pipelines. Key components for integrating the pre-processing as well as the training. We have our processing steps that are executed in order to pre-process that data prior to training. Then we…

Contents