From the course: Machine Learning with SageMaker by Pearson

Unlock this course with a free trial

Join today to access over 25,200 courses taught by industry experts.

Data cleaning and preprocessing with SageMaker Data Wrangler

Data cleaning and preprocessing with SageMaker Data Wrangler - Amazon SageMaker Tutorial

From the course: Machine Learning with SageMaker by Pearson

Data cleaning and preprocessing with SageMaker Data Wrangler

Data cleaning and preprocessing usually involves us doing some feature engineering. Feature engineering is the transformation of your raw data set values into important information. So if you have a data set of cars that have been in an accident, so you probably have a year the car was manufactured, make, model, place of incident, some sort of telemetry about the driver, such as gender, height, weight, sobriety, who knows. But all that information, all that sort of key value type information, like make and model, is a feature. And that information needs to be processed and cleaned up as well as engineered in order to arrive at a valid machine learning model in the end. So Data Wrangler has over 300 built-in transformations. And if you're using something like SageMaker Canvas, then you can go in there, open up Data Wrangler, and it will show you a preview of the data that you're feeding into it. And then you can apply step-by-step transformations, things like one-hot encoding, label…

Contents