From the course: Querying Microsoft SQL Server 2019
Remove duplicates with DISTINCT - SQL Server Tutorial
From the course: Querying Microsoft SQL Server 2019
Remove duplicates with DISTINCT
- [Narrator] One useful piece of information that you can pull from a database is to find out how many times a specific value appears in a table. For instance, I'll start up a new query and I'll type in select star from person.address. When I execute the query, we'll get the results here and looking in the bottom right hand corner, we'll see that I have records for 19,000 individual people. One of the columns in this table is the city that each person lives in. As you'd expect, there is lots of different people that live in the same city represented in this database. One question that you might want to ask is how many cities are there? You might be tempted to get the answer just by selecting the city column from the table. I'll change the query from select star to select city. This time when I execute the query, we'll see this result. But if I look in the bottom right hand corner, we'll see that I'm still getting the same 19,000 rows. These values are displaying in a different order but you can already see here that there's duplication with these two records. We're just getting the full city column from the data table. One way that we can get to the information that we're after is to ask the database for just the distinct cities in this table. To do that, you simply add the distinct keyword in the select clause. Now I'm asking the database to select the distinct cities from person.address. I'll execute this query and I'll see these results. Looking in the bottom right hand corner, I'll see that I have a total of 575 rows. Each of these rows represents a unique city name found in the original data table. You might want to add an order by clause to return the cities in an alphabetical order. I'll go ahead and do that on line number three. I'll execute the query and that'll sort the results alphabetically. Now we do need to be careful here because it's not uncommon for two different cities to have the same name. And right now, these results are simply returning distinct city names. The query itself has no knowledge of geography and can't tell the difference between Portland, Oregon, or Portland, Maine, or Portland, New Zealand. So, we might want to include another column in this query to better separate out distinct cities. We can add an additional column to the select statement that would help separate different cities with the same name. In the original table, there's a column that stores the state province ID. I'll go ahead and add that to the list. I think we can safely assume that it's very uncommon for there to be two cities with the same name in the same state or province. When I press the execute button, that brings the number of rows returned to 613. This represents all of the distinct pairings of city name and state province IDs found in the address table. And if I scroll down through the list, we're going to find that we have a number of cities named Berlin that are all from different state and provinces. Here they are starting with record number 38, and going to record 42. So using the distinct keyword in your select statements can help you filter your data to just the rows that are unique across all of the columns that you've asked for to be returned by the query.