How to Find Matching Rows in Two Pandas DataFrames
Let's learn how to find matching rows in two dataframes using Pandas.
Find Matching Rows Using merge()
merge() function is one of the most commonly used methods for finding matching rows in two DataFrames. It performs a SQL-style inner join, returning rows where matching values exist in both DataFrames.
import pandas as pd
# Sample DataFrames
df1 = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': ['a', 'b', 'c', 'd']
})
df2 = pd.DataFrame({
'A': [3, 4, 5, 6],
'B': ['c', 'd', 'e', 'f']
})
# Merge dataframes on columns 'A' and 'B' (default is inner join)
matching_rows = pd.merge(df1, df2, on=['A', 'B'], how='inner')
print(matching_rows)
Output
A B 0 3 c 1 4 d
The merge() function returns only the rows with matching values in both DataFrames, as shown in the output.
Using merge()
with a Common Column
You can also use merge()
to find matching rows based on a specific column, such as an ID column. The following example demonstrates an inner join between two DataFrames using the ID
column.
import pandas as pd
# Sample DataFrames
df1 = pd.DataFrame({
'ID': [1, 2, 3, 4],
'Name': ['John', 'Alice', 'Bob', 'Eve']
})
df2 = pd.DataFrame({
'ID': [2, 3, 4, 5],
'Age': [25, 30, 22, 29]
})
# Perform inner join on the 'ID' column
result = pd.merge(df1, df2, on='ID', how='inner')
print(result)
Output
ID Name Age 0 2 Alice 25 1 3 Bob 30 2 4 Eve 22
The merge() function matches rows based on the ID column and returns the rows where the ID values are common in both DataFrames.
Find Matching Rows Using isin()
The isin()
method allows you to filter rows in one DataFrame based on whether the values exist in another DataFrame. This method is particularly useful when you want to check for matches in a specific column.
import pandas as pd
# Create the first DataFrame: df1
df1 = pd.DataFrame({
'ID': [1, 2, 3, 4],
'Flower': ['Rose', 'Lily', 'Tulip', 'Daisy']
})
# Create the second DataFrame: df2
df2 = pd.DataFrame({
'ID': [3, 4, 7],
'Category': ['Outdoor', 'Outdoor', 'Indoor']
})
# Find rows in df1 where 'ID' matches with 'ID' in df2
matching_rows = df1[df1['ID'].isin(df2['ID'])]
print(matching_rows)
Output
ID Flower 2 3 Tulip 3 4 Daisy
Here, isin() checks if the ID in df1 matches any value in the ID column of df2, and returns the matching rows from df1.
Find Matching Rows Using query()
query()
method is another way to find matching rows in two DataFrames. You can use the in
operator within the query to filter rows based on common values between the DataFrames.
import pandas as pd
# Create the first DataFrame: df1
df1 = pd.DataFrame({
'BookID': [101, 102, 103],
'Title': ['To Kill a Mockingbird', '1984', 'The Great Gatsby']
})
# Create the second DataFrame: df2
df2 = pd.DataFrame({
'BookID': [103, 105, 107],
'Genre': ['Classic', 'Romance', 'Science Fiction']
})
# Find rows in df1 where 'BookID' matches with 'BookID' in df2 using query()
# Convert df2['BookID'] to a list to avoid the ValueError
result = df1.query("BookID in @df2.BookID").merge(
df2.query("BookID in @df1.BookID"),
on="BookID",
how="inner"
)
# Print the result
print(result)
Output
BookID Title Genre 0 103 The Great Gatsby Classic
In this example, query() is used to filter rows in df1 where the BookID matches any BookID in df2. The results are then merged using merge() to get the full details from both DataFrames.
Using concat() and duplicated() to Find Common Rows
You can concatenate two DataFrames using concat() and then use duplicated() to identify rows that appear in both DataFrames. This method works well when the column names are identical in both DataFrames.
import pandas as pd
# Create the first DataFrame: df1
df1 = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': ['a', 'b', 'c', 'd']
})
# Create the second DataFrame: df2
df2 = pd.DataFrame({
'A': [3, 4, 5, 6],
'B': ['c', 'd', 'e', 'f']
})
# Concatenate the two DataFrames
concatenated_df = pd.concat([df1, df2])
# Find duplicated rows (i.e., rows that appear in both DataFrames)
matching_rows = concatenated_df[concatenated_df.duplicated(keep=False)].drop_duplicates()
print(matching_rows)
Output:

Here, concat() merges the two DataFrames, and duplicated() identifies rows that appear in both, which are then filtered using drop_duplicates() to return the matching rows.