How to Drop One or Multiple Columns in Pandas DataFrame
When working with datasets, we need to remove unnecessary columns to simplify the analysis. In Python, the Pandas library provides several simple ways to drop one or more columns from a DataFrame.
Below is the sample dataframe we will be using in this article:
import pandas as pd
data = pd.DataFrame({
'A': ['A1', 'A2', 'A3', 'A4', 'A5'],
'B': ['B1', 'B2', 'B3', 'B4', 'B5'],
'C': ['C1', 'C2', 'C3', 'C4', 'C5']
})
print(data)
Output
A B C 0 A1 B1 C1 1 A2 B2 C2 2 A3 B3 C3 3 A4 B4 C4 4 A5 B5 C5
Let's explore different methods to remove one or multiple columns in pandas dataframe.
Using drop()
The most common method to remove columns is DataFrame.drop(). You can drop single or multiple columns by specifying their names.
a) Drop a Single Column
To drop a single column, use the drop() method with the column’s name.
df = data.drop('B', axis=1)
print(df)
Output
A C
0 A1 C1
1 A2 C2
2 A3 C3
3 A4 C4
4 A5 C5
Explanation:
- drop('B', axis=1): removes column B.
- axis=1: specifies that we are dropping columns (use axis=0 for rows).
b) Drop Multiple Columns
df = data.drop(['B', 'C'], axis=1)
Output
A
0 A1
1 A2
2 A3
3 A4
4 A5
Note: axis=1 is used for columns, while axis=0 is for rows.
Using Column Index
If you know the index positions of the columns to remove, you can use them instead of names, useful in automated processes.
df = data.drop(data.columns[[0, 2]], axis=1)
Output
B
0 B1
1 B2
2 B3
3 B4
4 B5
Explanation: data.columns[[0, 2]] selects the 1st and 3rd columns (A and C) for removal.
Using loc[] for Label-Based Ranges
loc[] method lets you remove columns by their names, which is useful for deleting specific columns without relying on their positions.
df = data.drop(columns=data.loc[:, 'B':'C'].columns)
Output
A
0 A1
1 A2
2 A3
3 A4
4 A5
Explanation: data.loc[:, 'B':'C'].columns selects all columns from B to C, which are then dropped.
Using pop() Method
pop() removes a specified column and returns it as a Series, allowing you to use that column’s data separately.
p1 = data.pop('B')
print(p1)
Output
0 B1
1 B2
2 B3
3 B4
4 B5
Name: B, dtype: object
Drop Columns Based on a Condition
When a column has too many missing values, it may not be useful for analysis. In such cases, we can remove those columns by setting a limit (threshold) for how many missing values are allowed.
Example: The following code drops columns having more than 50% missing values using a threshold condition.
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, None, 4],
'B': [None, None, None, 4],
'C': [1, 2, 3, 4]
})
threshold = len(df) * 0.5
df = df.dropna(thresh=threshold, axis=1)
print(df)
Output
A C 0 1.0 1 1 2.0 2 2 NaN 3 3 4.0 4
Explanation:
- len(df) * 0.5: sets 50% as the minimum non-null count required to keep a column.
- dropna(thresh=threshold, axis=1): removes columns with more than 50% missing data.