From the course: Complete Guide to Analytics Engineering
Aggregate functions in SQL
From the course: Complete Guide to Analytics Engineering
Aggregate functions in SQL
- [Instructor] Another fundamental aspect of SQL for analyzing data is to take data that is on a granular level and summarize it to a higher level. This is called data aggregation or data rollup. We did a similar function in Python when we created our sales statistics by associate table of data, but a great analytics engineer can perform the same task with multiple tools. That makes us a Swiss Army knife of engineering and also makes us more easily hired. Some companies will want to perform all of their data transformations in Python. Other companies will want to do all of their data transformations in SQL. So let's practice aggregating data in SQL. Our pretend stakeholder wanted to see sales for the employee IDs listed above. We can take it a step further and sum the quantity of sales and the order totals for those employees using the sum function. Let's try this. Let's write a new query. We're going to select employee ID. We're going to sum the quantity field. We're going to sum the order total field. We're going to use our red_30_US_sales_cleaned table, and where employee ID is in, and we're going to grab those employee IDs from before. Copy that. Let's paste it below, and let's run this. Something a little weird happened here. We wanted one row for each employee ID, but it gave us only one row and summed across all those employee IDs. This isn't what we want. We want to group by the employee ID, which tells the query interpreter not to sum across, but to sum each metric for each employee ID. Let's add that now. Group by comes after the where clause. We tell it to group by employee ID. Let's try running it again. That looks better. Now we have three rows, one for each employee ID with a sum of quantity and the sum of order total. Let's practice a few more aggregations. Up next is average. Under our sums, let's write average of quantity, and let's also add average of order total. Now in SQL, we can add aliases to fields that give it a new name in the query output. So let's add an alias, and we're going to call this average of order quantity. AverageQuantity. We're going to call the average of order total AverageOrderTotal. Let's execute. Now you can see the averages and that they have a new name in the output, the alias that we gave it before. Let's practice with min and max next. On a new line, let's add a comment and take the minimum of order total. We're going to call it SmallestOrderTotal. Now let's take the maximum, and we'll call it the LargestOrderTotal. Let's execute. Great, we can see the smallest and the largest for each of these employee IDs. In this query, we only group by one field, employee ID. It's also possible to group by multiple fields. Let's try grouping by a couple of fields. Right now, we only see the sales quantity in total across all time. But what if we also want to see those summary aggregations by order type. You could group by multiple fields in SQL. After Employee ID, let's add order type, and let's also add it after employee ID in the select statement. Now let's run again. Great. Now we can see each employee ID broken up by retail and wholesale sales. Valuable information there. With group bys we can group by as many fields as you want, but be warned, any field you want to add to the query, you also have to add to the group by. SQL doesn't let you add in, say order date, without grouping by it because it won't know how to handle the aggregation and the grouping of the data. There are two more aggregation techniques we should be familiar with: counting and counting unique values in a table. Let's write a new query. We'll select count star or asterisk from red_30_US_sales_cleaned. This query will tell us how many rows of data there are in this table. This table has a count of 4,976 rows. Very useful, but you can also count the unique number of a value or a dimension in the table using the distinct keyword. Instead of counting star, let's count distinct employee ID. Execute. This query tells us we have 44 unique employee IDs who have sold items in our sales data. Let's check that against our sales associates table to see if we also have four employee IDs in that table. We'll select count, distinct employee ID from, we're going to use a different table this time. It's red_30_Tech_US_Sales_Associates. Let's execute that query. We have the same output, 44. That's what we would expect. Every employee in our sales table is also in our associates table. Now we know how to aggregate data with SQL Select statements, a huge asset in our skillset, one that you'll probably use every day on the job as an analytics engineer. Up next, we're going to practice some SQL functions related to date and time fields.
Contents
-
-
-
-
-
-
-
-
(Locked)
Introduction to SQL for analytics engineering1m 17s
-
(Locked)
The SELECT statement4m 31s
-
(Locked)
Filtering data results with the WHERE clause9m 49s
-
Aggregate functions in SQL6m 51s
-
(Locked)
SQL date functions6m 28s
-
(Locked)
Inner joining multiple tables7m 26s
-
(Locked)
Left joining multiple tables8m 45s
-
(Locked)
Other types of SQL joins5m 18s
-
(Locked)
Common table expressions9m 30s
-
(Locked)
CoderPad solution: Modeling data with SQL37s
-
(Locked)
-
-
-
-