From the course: PostgreSQL: Advanced Queries

Using GROUP BY to aggregate data rows - PostgreSQL Tutorial

From the course: PostgreSQL: Advanced Queries

Using GROUP BY to aggregate data rows

- [Instructor] A SQL select statement is broken down into various clauses and functions that determine the output results. The most basic database query uses the select clause to identify the columns that you want returned from a table, and the from clause to specify the actual table itself. For instance, I'm going to select the two trees database in my PostgreSQL server, and then click on this button here in PG admin to open up a new query tool. Now, if I wanted to see a couple of columns from the products table, I could write out the query like this. We'll select the product name, category ID, size, and price columns from the inventory.products table. Then on the toolbar, I can come up and press this play button, or press the F five shortcut key to execute the query. This returns to the four columns that I asked for and all of the data for every product in the table. In this table we have information on 114 different products, so the query results show 114 rows. You can scroll through the results to see all of them, and we can see that they come all the way down to 114. Now, in order to filter this results set down to only show specific rows, we can add in a where clause with a criteria. All data rows that meet the criteria will be returned and all rows that don't meet the criteria are excluded. For instance, if I only wanted to see the products that had a price over $20, I can come here on to line number three and add in the where clause, and say where price is greater than 20. This time when I press the F five key to execute the query, we'll see these results and I have a total of 37 rows now. These are the 37 products with a price over $20. One row is displayed for every product that meets those criteria. Filtering your data with a where clause could help you explore the information that is stored in the database. Another way that we can get to a better understanding of our data is to start grouping rows together based off of common attributes. For instance, let's analyze our product information based off of the size of each product. I'll start another query here on line number five. I'll start by writing out a select statement that pulls only that column from the database. Now, in order to execute only the second query, I'll make sure that I highlight it first and then press the play button or press the F five shortcut key. That returns one row for every product in the table again, so 114 rows are displayed, but it's only showing me these size information. You can see that there's lots of products that have the same size. We have eight ounce products, 32 ounce products, 64 ounce products and so on. You might like to know how many different size classifications there are in the product data. We can do that by adding in a group by clause to the query, which will take all 114 rows from this result set and combine them into a single row for each size. To do that, I'll add in line here on to line number seven and we'll type in group by size. Now, when I execute the query, we'll see that result. And I can see that I have a total of eight rows here with the different size products. Now it might be nice to see this information in numerical order, so I'll also add in a sort to the result. On line number eight, I'll add in order by size DESC four descending. Once again, I'll highlight lines five down through eight, execute the query and I can see them in a better order. So now the largest size appears at the top and the smallest size appears at the bottom. In a group by query, each of these rows acts as a container for all of our original data. So even though we're only seeing eight rows right now, information on all 114 products is still available. You can think of the group by query, like adding in rows of data to different buckets. All of the eight ounce products go into that bucket. And they're separated from all of these 16 ounce products. Now that we have these buckets, we can look inside each one and do things with all of the rows of data that each bucket contains. The most common thing that you might want to know is how many products are in each size category. We can find that information by adding in another column to the results. Remember that columns are defined in the select clause. So we can add in a counting function up here. On my number five, after the select size, I'll type in a comma and I'll type in a count function. The accounting function needs to know what to count. And in this case, we just want to count all of the records that are in each bucket. So inside a parentheses I'll type in an asterisk. Now when I run the query by highlighting lines five through eight, we'll see we have a new column here that has the count of the number of products inside of each size category. There are a total of 18 products in our 128 ounce size category, and only two products in our four ounce product size. To clean up the display and make these column headers a little bit more descriptive, we can add in some aliases. We'll do that with the as keyword. So after the word size, I'll type in the as keyword and in double quotation marks, we'll type in a new alias product size. We'll do the same thing to our accounting function, I'll type in the as keyword after that, and in double quotation marks number of products. Now when I execute this query, we'll see that we have better names here at the top. This clarifies what each column is displaying. So now we have the number of products in each of our product size categories. Finally, we can limit the rows that are returned in a group by query, just like we filter out rows from a standard select query. When filtering groups, you'll use the having clause. For instance, if you only wanted to see the groups that have over 10 products, you can come up here into the query and I'll make a new line after the group by clause, and I'll type having a count of star greater than 10. Now, when I execute this query, any rows or any groups that have less than 10 products versus this one here, the 12 ounce size, and these two, the six ounce and four ounce products will be removed. Let me execute the query, and we'll see that's exactly the result that we get. Now we only see a total of five different size categories that all have over 10 products in them. Now it's important to note here that when the query is processed, the alias names are defined at the very last step, even though they're written out in our query at the very beginning. This means that you'll need to reference the original column name before aliasing in the having clause. That's why we're using having counter star here on line number eight, instead of having number of products. If you try and write it out as having number of products greater than 10, the query will return an error. So with these group by clauses, you can start to combine your data together based off of common attributes, giving you a different dimension for your analysis work, and it can allow you to better understand the scope of your data.

Contents