From the course: Delivering Data-Driven Decisions with AWS: Applying Machine Learning, Data Engineering, and Generative AI
Linear regression with Amazon SageMaker Canvas
From the course: Delivering Data-Driven Decisions with AWS: Applying Machine Learning, Data Engineering, and Generative AI
Linear regression with Amazon SageMaker Canvas
- [Instructor] In this movie, we'll be learning how to make predictions using Amazon's SageMaker Canvas with no code ML. And so this is ideal for business analysts and developers who don't have any prior machine learning experience or knowledge. So no coding or programming skills are required, and we'll simply get started by exploring a California housing prices data set, which is based on the 1990s census, and there are a few features that we'll be exploring. So in this housing prices data set, we have a target variable, which is median_house_value, and then which is a continuous variable, and then we also have other features as well. So you can see a distribution over here, and so they're all continuous variables. Our explanatory variables such as total_bedrooms, total_population, median_income, and so we'll simply get started now, and we'll go back to a AWS management console. And I have already created an Amazon SageMaker domain already, and I can click under Canvas on the left hand side. And you can see the SageMaker domain has been created. And click Open Canvas to launch Canvas. And so with this interface, the Canvas interface, just wanted to show you, we have access to generative AI foundation models that you can build. We also have a ready to use models for any business case that you may have, such as entity extraction or sentiment analysis or detect objects from images, so you can also explore at another time. But today, because we want to build a predictive model, linear regression, I'll simply need to create my own custom model. So I'll click create custom model, and I'll give the model a name. For example, Demo_linear_regression, and make sure that the dial is selected for predictive analysis, and click create. And so we have some sample data sets provided by Canvas, and I've already uploaded the California house prices data set and also a Sydney House prices dataset from my Amazon S3 bucket, but I'll show you how you could attach data. So you just select create dataset and just give it a name. And under local upload, there's quite a few options available for you, such as upload from your data lake, Amazon S3, from your local computer directory, you can, you know, link from Snowflake or Redshift Cloud Data Warehouse, or even other databases such as MySQL or Databricks Postgres SQL. So I won't upload the dataset 'cause I already completed that step already, and I'll simply select the dataset California_house_prices, select dataset. And before we build our machine learning model, we just want to check that we have the correct data type selected for each of our features. So make sure that, you know, population and total_rooms, that they're numerical variables, and we just check our target variable as well. So that's median_house_value, that's numeric, a continuous variable, so that's good. And then we need to select our target column where we'll be making our predictions, our Y variable, our dependent variable. So median house value is just selected, and then we'd like to configure our model as well under model type, and so we just, Amazon SageMaker Canvas has already provided a recommendation. So this is the best model that they recommend. Obviously you can configure as well, so I won't change anything here. And you can have a look at further features. So the data split for the training and test split is 80/20. And with the objective metric, the evaluation metric of the model, it's defaulted to MSE, but I want to select R squared, so I'll change it to R2 and then click save. And then if I want to explore the data and see distributions, I can click on the data visualizer. And I might want to, you know, complete a scatterplot. So I can simply just drag a variable here So house_median_age, and then also total_bedrooms just to see if there's any correlation between two quantitative variables. Okay, so you can see with this scatter plot, it's quite noisy and there is no relationship between these two variables. And so what we can do now, the next step is we can have a look at the analytics as well. This is a correlation matrix, so this is very useful in helping you understand visually, what are the correlations between the quantitative variables of the dataset. And so I'll just click under configure model. So I'm happy with that step, and so now we're ready to build our model, so select quick build. And so the data will be validated, it'll take a few minutes to validate, and then it will build our model or train the model. And so this will take about five minutes, and so I want to discuss and show you the actual steps to help you get started. And so if you refer to AWS documentation, in particular, the page getting started with using Amazon SageMaker Canvas, there are a few prerequisites that are required to help you get started. The first one is setting up an Amazon SageMaker domain, which you will need, and also to provide configuration for Amazon S3, your location where your data is stored, so you can attach that to your model within Amazon SageMaker Canvas. And there's also other optional permissions that you may wish to attach if you're building other models as well. This one is the third one, give permission to local file, to upload local files is quite useful. If you're having any trouble uploading your dataset into your Amazon S3 bucket, I'll just show you what you might need to do to navigate. So if you click into AWS IAM, select on the left-hand side roles. When you create through Amazon SageMaker domain, you'll have a SageMaker execution policy. There might be additional permissions that you need to attach to the policy, and there are three that I've noted here. So it's Amazon SageMaker Canvas AI, Amazon SageMaker Canvas Full Access, and Amazon SageMaker Full Access, and this allows you to attach your dataset and upload it from Amazon S3. And if we open the plus sign, if we just scroll towards the bottom, we will see that there's an allow for S3. Okay, and as we can see, this is a resource, so it's allowing Amazon SageMaker to access the S3 bucket, which is what we want. So sometimes you might need to attach additional permissions to the SageMaker policy. And then also just referring back to the documentation as well, so this will provide you with step-by-step process in how to get started with setting up your environment for Amazon SageMaker Canvas. So you need to go into admin configurations to choose the domain and then to create that domain, and then configure as above. So these are optional configurations. And then this will walk you through how to select storage, network and storage, create a role in IAM if required. And if you are creating a new role, make sure that you select Amazon SageMaker Execution Policy and attach with the policy, the additional policy, Amazon SageMaker Full Access. And then also just follow the steps in setting up the domain as well. So there may be base permissions that you might need to configure, encryption key is optional. Here you might need to provide the path to your Amazon S3 bucket, and then you can skip this stage 'cause we are not creating any ready to use models in this demo. And also give yourself permission to use specific features. So you might want to take note here to also upload the local file, and so make sure that you specify a path, the region, and your AWS account to your S3 bucket to grant permissions to upload the file. And so this is the step that we've already seen already, So once you create your Amazon SageMaker domain, you can launch the Canvas and then you can navigate to the getting started area, and you can open Canvas, you can explore also additional tutorials if you need, and then you can create your model. So what we'll do now is we'll check the training job. So it's still building our model. And then, so let's have a look at our data set as well. So now let's take a closer look at Amazon's SageMaker Canvas to see if our model has finished building. So great, if we have a look at the analyze tab, we can see that the model has finished building, and we have a few evaluation metrics for our model. The first one that we have is R2, R squared, and this is basically telling us how well the predicted model fits the actual data. So when we have a score of 86%, this is a really good result, and it means most of the variance within our regression model is explained by the explanatory variables or our features. And then we can also have a look at another model evaluation metric, which is RMSE, the root means squared error, and the smaller the RMSE, the model makes more accurate predictions. And if we hover over overview and we have a look at the column impact, we can have a look at the column impact of each of our features against the prediction, which is our median house value. So if we click under latitude, we can see that the column impact on house value is 27.9%, and then if we have a look at, for example, median income, the column impact is 8.4%, 8.4% against impact of the prediction on median house value, and then we can also have a look at scoring. And then we have a look at scoring, we can see model accuracy. So this is giving us further insights, and it's basically a plot of the predicted values of the median house value against the actual value. So we can see how well the values are plotted against our predicted line. And then we can also hover over advanced metrics, and we also get an overview as well. So we can see a plot of the residuals if I just scroll down. So we can see a plot of the residuals of predicted value of the median house value and the actual values of the predicted values, and then we also have the R2 score and also the RMSE. So now we're ready to make predictions, and so we just click on the button "predict". And we have two choices, we can make a single prediction, which we'll just click on the single prediction, and this is in real time as well, giving us a single value. And as we can see in the 1990s, the predicted median house value was $320,804 in California. And you can also download this prediction as a CSV file or as an image. Also, if you'd like to make a batch prediction, you can also generate batch prediction on your entire dataset, so you click on automatic and then you can bring in your own data. But we won't do that here, so last thing we do is after we finish using Amazon SageMaker Canvas, we like to follow best practice and we can exit out of the Canvas interface to stop any workspace instant charges. And I'd also like to refer you to have a look at the Amazon, the AWS documentation, which is manage billing and cost in SageMaker Canvas. So have a look at the billing, and we want to be familiar with how the workspace instance charges work, and we want to avoid any surprise billing at the end of the month. So you can have a look at this in your own time, and then we want to refer to Manage Apps. So best practice is to also, if you just navigate to manage apps, we also want to delete the domain. We want to delete also the user profile, and also finally, we'd like to delete the app. So I'd like to refer you to this page, Manage Apps, to follow through all the steps after you finish using Amazon SageMaker Canvas. So I hope you've enjoyed this demo, and hope you're encouraged to bring in your own data and start experimenting and building your own model.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.