Reindexing in Pandas DataFrame
Reindexing in Pandas is used to change the row or column labels of a DataFrame to match a new set of indices. This is useful when aligning data, adding missing labels, or reshaping your DataFrame. If the new index includes values not present in the original DataFrame, Pandas fills those with NaN by default. For example, if we try adding a new row using reindex():
import pandas as pd
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
# Reindex the DataFrame with a new index
a = df.reindex([0, 1, 2, 3])
print(a)
Output
A B 0 1.0 4.0 1 2.0 5.0 2 3.0 6.0 3 NaN NaN
As you can see, index 3 wasn’t present in the original DataFrame, so it's filled with NaN.
How Does Reindexing Work?
When you reindex a DataFrame, you provide a new set of labels (indices) for either the rows or columns. If any of these new labels are not present in the original DataFrame, Pandas will assign NaN as the value for those missing indices. The syntax for reindex() is as follows:
DataFrame.reindex(labels=None, index=None, columns=None, axis=None, method=None, copy=True, fill_value=NaN)
- labels: New labels/indexes to conform to.
- index/columns: New row/column labels.
- fill_value: Value to use for filling missing entries (default is NaN).
- method: Method for filling holes (ffill, bfill, etc.).
Reindexing Rows
You can change or expand row indices using reindex(). Any new index not found in the DataFrame will be assigned NaN, unless you provide a fill_value.
Example: Add new rows with a fill value
import pandas as pd
data = {'A': [10, 20, 30], 'B': [40, 50, 60]}
df = pd.DataFrame(data)
new_index = [0, 1, 2, 3]
a = df.reindex(new_index, fill_value=0)
print(a)
Output
A B 0 10 40 1 20 50 2 30 60 3 0 0
Example: Add multiple new rows
import pandas as pd
data = {'A': [10, 20, 30], 'B': [40, 50, 60]}
df = pd.DataFrame(data)
new_index = [0, 1, 2, 3, 4, 5]
a = df.reindex(new_index, fill_value=0)
print("Reindexed DataFrame:")
print(a)
Output
Reindexed DataFrame: A B 0 10 40 1 20 50 2 30 60 3 0 0 4 0 0 5 0 0
Reindexing Columns
You can change the order of columns or add new columns using reindex() on the columns parameter.
Example: Reorder and add new column
import pandas as pd
data = {'A': [10, 20, 30], 'B': [40, 50, 60]}
df = pd.DataFrame(data)
a = df.reindex(columns=['B', 'A', 'C'], fill_value=100)
print(a)
Output
B A C 0 40 10 100 1 50 20 100 2 60 30 100
Example: Reindex columns using axis
import pandas as pd
data = {'A': [10, 20, 30], 'B': [40, 50, 60]}
df = pd.DataFrame(data)
a = df.reindex(['B', 'A', 'C'], axis=1, fill_value=100)
print("Reindexed DataFrame:")
print(a)
Output
Reindexed DataFrame: B A C 0 40 10 100 1 50 20 100 2 60 30 100
Handling Missing Values with ffill and bfill
When new labels introduce NaN values, you can handle them using:
- ffill() : Fills NaN with the previous non-null value.
- bfill() : Fills NaN with the next non-null value.
Example: Forward Fill
import pandas as pd
data = {'A': [10, 20, 30], 'B': [40, 50, 60]}
df = pd.DataFrame(data)
new_index = [0, 1, 2, 3, 4]
a = df.reindex(new_index)
df_ffill = a.ffill()
print(df_ffill)
Output
A B 0 10.0 40.0 1 20.0 50.0 2 30.0 60.0 3 30.0 60.0 4 30.0 60.0
Example: Backward Fill
import pandas as pd
data = {'A': [10, 20, 30], 'B': [40, 50, 60]}
df = pd.DataFrame(data)
new_index = [0, 1, 2, 3, 4]
a = df.reindex(new_index)
df_bfill = a.bfill()
print(df_bfill)
Output
A B 0 10.0 40.0 1 20.0 50.0 2 30.0 60.0 3 NaN NaN 4 NaN NaN