Pandas Dataframe Index
Index in pandas dataframe act as reference for each row in dataset. It can be numeric or based on specific column values. The default index is usually a RangeIndex starting from 0, but you can customize it for better data understanding. You can easily access the current index of a dataframe using the index
attribute. Let's us understand with the help of an example:
1. Accessing and Modifying the Index
import pandas as pd
data = {'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
'Age': [25, 30, 22, 35, 28],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Salary': [50000, 55000, 40000, 70000, 48000]}
df = pd.DataFrame(data)
print(df.index) # Accessing the index
Output
RangeIndex(start=0, stop=5, step=1)
2. Setting a Custom Index
To set a custom index, you can use the set_index() method, allowing you to set a custom index based on a column, such as Name or Age.
# Set 'Name' column as the index
df_with_index = df.set_index('Name')
print(df_with_index)
Output
Age Gender Salary Name John 25 Male 50000 Alice 30 Female 55000 Bob 22 Male 40000 Eve 35 Female 70000 Charlie 28 Male 480...
There are various operations you can perform with the DataFrame index, such as resetting it, changing it, or indexing with loc[]. Let's understand these as well:
3. Resetting the Index
If you need to reset the index back to default integer index, use reset_index() method. This will convert the current index into a regular column and create a new default index.
# Reset the index back to the default integer index
df_reset = df.reset_index()
print(df_reset)
Output
Name Age Gender Salary 0 John 25 Male 50000 1 Alice 30 Female 55000 2 Bob 22 Male 40000 3 Eve 35 Female 70000 4 Charlie 28 Male 48000
4. Indexing with loc
The loc[] method in pandas allows to access rows and columns of a dataFrame using their labels, making it easy to retrieve specific data points.
row = df.loc['Alice']
print(row)
Output
Age 30 Gender Female Salary 55000 Name: Alice, dtype: object
5. Changing the Index
Change the index of dataFrame, with help of set_index() method; allows to set one or more columns as the new index.
# Set 'Age' as the new index
df_with_new_index = df.set_index('Age')
print(df_with_new_index)
Output
Name Gender Salary Age 25 John Male 50000 30 Alice Female 55000 22 Bob Male 40000 35 Eve Female 70000 28 Charlie Male 480...
Here are some Key Takeaways:
- Use .loc[] for label-based row selection and set_index() to set custom indices.
- Access the index with .index and reset_index() restores the default index, with an option to drop the old index.