From the course: Machine Learning with SageMaker by Pearson
Unlock this course with a free trial
Join today to access over 25,200 courses taught by industry experts.
Validating data quality with AWS tools - Amazon SageMaker Tutorial
From the course: Machine Learning with SageMaker by Pearson
Validating data quality with AWS tools
Here, on the right side of our screen, we see the lifecycle of an ML model. Collect data, ingest it, prepare it, validate it, store it, explore it, train, monitor, archive, delete. And around we go. Now we are talking about validation of our data. We want to ensure consistency and reliability in our models, address anomalies, outliers, missing data, and prevent biases. Biases are going to be reflected in the predictions that our ML models provide. Data Wrangler, DataBrew, we've talked about these extensively in the previous lessons. Also, Model Monitor, we haven't talked about this as much, but it can be used to detect data drift and biases. So we are using Data Wrangler and Data Brew in order to clean our data, explore it, find those outliers, find the missing data, and impute and normalize where necessary. Model Monitor can then be run as our model is up and running in production. We can use Model Monitor to detect drift and any biases that could be creeping in. RCF, or Random Cut…