Get Unique Values from a Column in Pandas DataFrame
Retrieving unique values from a column in a Pandas DataFrame helps identify distinct elements, analyze categorical data, or detect duplicates. For example, if a column B contains ['B1', 'B2', 'B3', 'B4', 'B4'], the unique values are ['B1', 'B2', 'B3', 'B4'].
Here is the sample DataFrame used in this article:
import pandas as pd
data = {'A': ['A1', 'A2', 'A3', 'A4', 'A5'],
'B': ['B1', 'B2', 'B3', 'B4', 'B4'],
'C': ['C1', 'C2', 'C3', 'C3', 'C3'],
'D': ['D1', 'D2', 'D2', 'D2', 'D2'],
'E': ['E1', 'E1', 'E1', 'E1', 'E1']}
df = pd.DataFrame(data)
print(df)
Output
A B C D E 0 A1 B1 C1 D1 E1 1 A2 B2 C2 D2 E1 2 A3 B3 C3 D2 E1 3 A4 B4 C3 D2 E1 4 A5 B4 C3 D2 E1
Let's explore different methods to get unique values from a column in Pandas.
Using unique() method
The unique() method returns a NumPy array. The order of the unique values is preserved based on their first occurrence.
Example: In this example, we retrieve the unique values from column 'B'.
unique_B = df['B'].unique()
print(unique_B)
Output
['B1' 'B2' 'B3' 'B4']
Using nunique()
The nunique() method counts the number of unique values in a column. It is useful when you only need the count of distinct values rather than the values themselves.
Example: Here we get the number of unique values in columns 'A', 'B', 'C', and 'D'.
print("Unique count in A:", df['A'].nunique())
print("Unique count in B:", df['B'].nunique())
print("Unique count in C:", df['C'].nunique())
print("Unique count in D:", df['D'].nunique())
Output
Unique count in A: 5
Unique count in B: 4
Unique count in C: 3
Unique count in D: 2
Using drop_duplicates()
The drop_duplicates() method removes duplicate values in the specified column, returning a DataFrame with only the unique values. The index of the original DataFrame is preserved.
Example: This code retrieves unique values from column 'C'.
unique_C = df['C'].drop_duplicates()
print(unique_C)
Output
0 C1
1 C2
2 C3
Name: C, dtype: object
Using value_counts()
The value_counts() method counts the occurrences of each unique value in the column and returns the result as a Series.
Example: Here we count unique values in column 'D' and also extract only the unique values.
counts_D = df['D'].value_counts()
unique_D = counts_D.index
print("Value counts in D:\n", counts_D)
print("Unique values in D:", list(unique_D))
Output
Value counts in D:
D
D2 4
D1 1
Name: count, dtype: int64
Unique values in D: ['D2', 'D1']
Using set()
You can also use Python’s built-in set() function, which converts the column values into a set, automatically removing duplicates.
Example: Here we get unique values from column 'D' using set().
unique_set_D = set(df['D'])
print(unique_set_D)
Output
{'D2', 'D1'}