Pandas Read CSV in Python
CSV files are the Comma Separated Files. It allows users to load tabular data into a DataFrame, which is a powerful structure for data manipulation and analysis. To access data from the CSV file, we require a function read_csv() from Pandas that retrieves data in the form of the data frame. Here’s a quick example to get you started.
Suppose you have a file named people.csv.
First, we must import the Pandas library. then using Pandas load this data into a DataFrame as follows:
import pandas as pd
# reading csv file
df = pd.read_csv("people.csv")
df
Output:
Pandas Read CSV in Python
read_csv()
function – Syntax & Parameters
read_csv()
function in Pandas is used to read data from CSV files into a Pandas DataFrame. A DataFrame is a powerful data structure that allows you to manipulate and analyze tabular data efficiently. CSV files are plain-text files where each row represents a record, and columns are separated by commas (or other delimiters).
Here is the Pandas read CSV syntax with its parameters.
Syntax: pd.read_csv(filepath_or_buffer, sep=’ ,’ , header=’infer’, index_col=None, usecols=None, engine=None, skiprows=None, nrows=None)
Parameters:
- filepath_or_buffer: Location of the csv file. It accepts any string path or URL of the file.
- sep: It stands for separator, default is ‘, ‘.
- header: It accepts int, a list of int, row numbers to use as the column names, and the start of the data. If no names are passed, i.e., header=None, then, it will display the first column as 0, the second as 1, and so on.
- usecols: Retrieves only selected columns from the CSV file.
- nrows: Number of rows to be displayed from the dataset.
- index_col: If None, there are no index numbers displayed along with records.
- skiprows: Skips passed rows in the new data frame.
Features in Pandas read_csv
1. Read specific columns using read_csv
The usecols
parameter allows to load only specific columns from a CSV file. This reduces memory usage and processing time by importing only the required data.
df = pd.read_csv("people.csv", usecols=["First Name", "Email"])
print(df)
Output:
First Name Email
0 Shelby elijah57@example.net
1 Phillip bethany14@example.com
2 Kristine bthompson@example.com
3 Yesenia kaitlinkaiser@example.com
4 Lori buchananmanuel@example.net
2. Setting an Index Column (index_col
)
The index_col
parameter sets one or more columns as the DataFrame index, making the specified column(s) act as row labels for easier data referencing.
df = pd.read_csv("people.csv", index_col="First Name")
print(df)
Output:

Read CSV in Python
3. Handling Missing Values Using read_csv
The na_values
parameter replaces specified strings (e.g., "N/A"
, "Unknown"
) with NaN
, enabling consistent handling of missing or incomplete data during analysis.\
df = pd.read_csv("people.csv", na_values=["N/A", "Unknown"])
We won’t got nan values as there is no missing value in our dataset.
4. Reading CSV Files with Different Delimiters
In this example, we will take a CSV file and then add some special characters to see how the sep parameter works.
import pandas as pd
# Sample data stored in a multi-line string
data = """totalbill_tip, sex:smoker, day_time, size
16.99, 1.01:Female|No, Sun, Dinner, 2
10.34, 1.66, Male, No|Sun:Dinner, 3
21.01:3.5_Male, No:Sun, Dinner, 3
23.68, 3.31, Male|No, Sun_Dinner, 2
24.59:3.61, Female_No, Sun, Dinner, 4
25.29, 4.71|Male, No:Sun, Dinner, 4"""
# Save the data to a CSV file
with open("sample.csv", "w") as file:
file.write(data)
print(data)
Output:
totalbill_tip, sex:smoker, day_time, size
16.99, 1.01:Female|No, Sun, Dinner, 2
10.34, 1.66, Male, No|Sun:Dinner, 3
21.01:3.5_Male, No:Sun, Dinner, 3
23.68, 3.31, Male|No, Sun_Dinner, 2
24.59:3.61, Female_No, Sun, Dinner, 4
25.29, 4.71|Male, No:Sun, Dinner, 4
The sample data is stored in a multi-line string for demonstration purposes.
- Separator (
sep
): Thesep='[:, |_]'
argument allows Pandas to handle multiple delimiters (:
,|
,_
,,
) using a regular expression. - Engine: The
engine='python'
argument is used because the default C engine does not support regular expressions for delimiters.
# Load the CSV file using pandas with multiple delimiters
df = pd.read_csv('sample.csv',
sep='[:, |_]', # Define the delimiters
engine='python') # Use Python engine for regex separators
df
Output:
totalbill tip Unnamed: 2 sex smoker Unnamed: 5 day time Unnamed: 8 size
16.99 NaN 1.01 Female No NaN Sun NaN Dinner NaN 2.0
10.34 NaN 1.66 NaN Male NaN No Sun Dinner NaN 3.0
21.01 3.50 Male NaN No Sun NaN Dinner NaN 3.0 NaN
23.68 NaN 3.31 NaN Male No NaN Sun Dinner NaN 2.0
24.59 3.61 NaN Female No NaN Sun NaN Dinner NaN 4.0
25.29 NaN 4.71 Male NaN No Sun NaN Dinner NaN 4.0
5. Using nrows in read_csv()
The nrows
parameter limits the number of rows read from a file, enabling quick previews or partial data loading for large datasets. Here, we just display only 5 rows using nrows parameter.
df = pd.read_csv('people.csv', nrows=3)
df
Output:
First Name Last Name Sex Email Date of birth Job Title
0 Shelby Terrell Male elijah57@example.net 1945-10-26 Games developer
1 Phillip Summers Female bethany14@example.com 1910-03-24 Phytotherapist
2 Kristine Travis Male bthompson@example.com 1992-07-02 Homeopath
6. Using skiprows in read_csv()
The skiprows
parameter skips unnecessary rows at the start of a file, which is useful for ignoring metadata or extra headers that are not part of the dataset.
df= pd.read_csv("people.csv")
print("Previous Dataset: ")
print(df)
# using skiprows
df = pd.read_csv("people.csv", skiprows = [4,5])
print("Dataset After skipping rows: ")
print(df)
Output:
Previous Dataset:
First Name Last Name Sex Email Date of birth Job Title
0 Shelby Terrell Male elijah57@example.net 1945-10-26 Games developer
1 Phillip Summers Female bethany14@example.com 1910-03-24 Phytotherapist
2 Kristine Travis Male bthompson@example.com 1992-07-02 Homeopath
3 Yesenia Martinez Male kaitlinkaiser@example.com 2017-08-03 Market researcher
4 Lori Todd Male buchananmanuel@example.net 1938-12-01 Veterinary surgeon
5 Erin Day Male tconner@example.org 2015-10-28 Management officer
6 Katherine Buck Female conniecowan@example.com 1989-01-22 Analyst
7 Ricardo Hinton Male wyattbishop@example.com 1924-03-26 Hydrogeologist
Dataset After skipping rows:
Pandas Read CSV
7. Parsing Dates (parse_dates
)
The parse_dates
parameter converts date columns into datetime objects, simplifying operations like filtering, sorting, or time-based analysis.
df = pd.read_csv("people.csv", parse_dates=["Date of birth"])
print(df.info())
Output:
<class 'pandas.core.frame.DataFrame'> RangeIndex: 5 entries, 0 to 4 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 First Name 5 non-null object 1 Last Name 5 non-null object 2 Sex 5 non-null object 3 Email 5 non-null object 4 Date of birth 5 non-null datetime64[ns] 5 Job Title 5 non-null object dtypes: datetime64[ns](1), object(5) memory usage: 368.0+ bytes
Loading a CSV Data from a URL
Pandas allows you to directly read a CSV file hosted on the internet using the file’s URL. This can be incredibly useful when working with datasets shared on websites, cloud storage, or public repositories like GitHub.
url = "https://media.geeksforgeeks.org/wp-content/uploads/20241121154629307916/people_data.csv"
df = pd.read_csv(url)
df
Output:
First Name Last Name Sex Email Date of birth Job Title
0 Shelby Terrell Male elijah57@example.net 1945-10-26 Games developer
1 Phillip Summers Female bethany14@example.com 1910-03-24 Phytotherapist
2 Kristine Travis Male bthompson@example.com 1992-07-02 Homeopath
3 Yesenia Martinez Male kaitlinkaiser@example.com 2017-08-03 Market researcher
4 Lori Todd Male buchananmanuel@example.net 1938-12-01 Veterinary surgeon