From the course: Machine Learning with SageMaker by Pearson
Unlock this course with a free trial
Join today to access over 25,200 courses taught by industry experts.
Data preprocessing demonstration - Amazon SageMaker Tutorial
From the course: Machine Learning with SageMaker by Pearson
Data preprocessing demonstration
I saw a demo previously of Data Wrangler. And just to build on that a little bit, I have another demo in here of Data Wrangler that looks at outliers as well as using Parkit. So we're going to go into SageMaker AI here. And over in our terminal, in our datasets directory, we have this sample loan approval data set dirty. And in the scripts directory, we have a CSV to parquet script. So if we go ahead and execute CSV to parquet on sample loan approval data, data set underscore dirty.csv, and we can see that it showed us the derived schema And the output of that is this saved parquet file. And then with that, I have already put it into the S3 bucket. But just as a reminder, we can do AWS S3 sync dot to S3 colon whack whack adgu-datasets. Now, I'm using adgu-datasets. Oh, well, it looks like I'm putting git in there. That was an accident. Anyway, I am using ADGU datasets as my bucket for these labs. You won't be able to use that because these names are globally unique across all…