GeoPandas Tutorial
GeoPandas is an open-source Python library that makes working with geospatial data easy. It extends pandas to support geometric data types and operations, enabling spatial analysis and visualization directly in Python. Commonly used in GIS, data science and environmental analytics, GeoPandas supports file formats like Shapefile and GeoJSON and integrates well with tools like Matplotlib, Folium and Plotly.
Important Facts to Know:
- Built on pandas: Adds geometry support (Points, Lines, Polygons) to DataFrames.
- Simple Mapping: Use .plot() for quick geospatial visualizations.
- Spatial Operations: Perform joins, overlays, projections and more.
- Wide Format Support: Reads Shapefiles, GeoJSON, KML and other geospatial formats.
Table of Content
This tutorial takes you from GeoPandas basics to advanced use, with practical examples using real-world datasets.
1. How to Install GeoPandas
Installing GeoPandas can vary depending on your system. Here's a guide for different platforms:
- How to Install GeoPandas on Windows
- How to Install GeoPandas Package on Ubuntu
- How to Install GeoPandas on MacOS
- How to Install GeoPandas in Kaggle
2. Basic Operations with GeoPandas
In this section, we’ll cover the basic operations you can perform using GeoPandas, while introducing key geospatial concepts such as spatial data types, file formats and coordinate reference systems (CRS).
2.1 Reading and Writing Spatial Data
GeoPandas can read and write various spatial formats. It reads data into a GeoDataFrame using gpd.read_file().
import geopandas as gpd
# Load Barcelona districts dataset from GitHub
url = "https://raw.githubusercontent.com/jcanalesluna/bcn-geodata/master/districtes/districtes.geojson"
districts = gpd.read_file(url)
print(districts.head())
print("CRS:", districts.crs)
Output

2.2 Types of Spatial Data
1. Vector Data (Points, Lines, Polygons)
GeoPandas supports creation of these geometries using shapely (which it integrates internally):
import geopandas as gpd
from shapely.geometry import Point, LineString, Polygon
import matplotlib.pyplot as plt
point = Point(77.2090, 28.6139) # Longitude, Latitude
line = LineString([(0, 0), (1, 2), (2, 4)])
polygon = Polygon([(0, 0), (1, 1), (1, 0)])
gdf = gpd.GeoDataFrame({
'name': ['Delhi', 'Route1', 'Area1'],
'geometry': [point, line, polygon]
})
print(gdf)
gdf.plot()
plt.show()
Output


Explanation:
- gpd.read_file("your_file.shp") loads a Shapefile (.shp) into GeoPandas.
- read_file() function automatically detects spatial formats and returns a GeoDataFrame, which is like a pandas DataFrame but with a geometry column.
- print(gdf.head()) prints the first 5 rows of the GeoDataFrame to quickly check attributes and geometry.
2. Raster Data (Grid-Based)
GeoPandas doesn’t handle raster directly, but you can use rasterio for raster operations:
import rasterio
with rasterio.open(r"C:\Users\visha\OneDrive\Desktop\Python\RGB.byte.tif") as src:
print("Raster Profile:")
print(src.profile)
Output

2.3 Common Spatial Data Formats
GeoPandas supports several vector data formats. The most common ones are Shapefile and GeoJSON. These formats are used for storing spatial data and are widely supported in both GIS tools and programming workflows.
1. Reading a Shapefile
A Shapefile is not just a single file but a collection of related files:
Required Files in a Shapefile:
- countries.shp : Contains geometric data (points, lines, polygons)
- countries.shx : Index for geometries
- countries.dbf : Attribute data in tabular format
- countries.prj (optional) : Coordinate system (CRS)
Example Folder Structure:
data/
├── countries.shp
├── countries.shx
├── countries.dbf
└── countries.prj
Example:
import geopandas as gpd
# Reading a shapefile (requires .shp, .shx, .dbf to be in same folder)
gdf = gpd.read_file("data/countries.shp")
2. Reading a GeoJSON File
GeoJSON is a popular, web-friendly format that stores both geometry and attribute data in a single .geojson file.
Example File:
data/
└── countries.geojson
Example:
import geopandas as gpd
# Reading a GeoJSON file
gdf = gpd.read_file("data/countries.geojson")
3. Writing to GeoJSON or Shapefile
GeoPandas allows exporting spatial data to both formats using the to_file() method.
1. Export to GeoJSON:
# Exporting GeoDataFrame to a GeoJSON file
gdf.to_file("output/output.geojson", driver="GeoJSON")
Output
output/
└── output.geojson
2. Export to Shapefile:
# Exporting GeoDataFrame to a Shapefile
gdf.to_file("output/output_shapefile.shp")
Output
output/
├── output_shapefile.shp
├── output_shapefile.shx
├── output_shapefile.dbf
└── output_shapefile.prj
3.1 Coordinate Reference Systems (CRS)
A Coordinate Reference System (CRS) defines how spatial data is projected onto the Earth’s surface. GeoPandas uses EPSG codes (standard CRS identifiers) to handle CRS transformations.
1. Checking the Current CRS
# Check current CRS of a GeoDataFrame
print(gdf.crs)
2. Reprojecting to Another CRS
# Reproject to Web Mercator (EPSG:3857)
gdf = gdf.to_crs(epsg=3857)
3. Setting CRS Manually (if missing)
# If CRS is undefined or missing
gdf.set_crs(epsg=4326, inplace=True)
3.2 Attributes and Geometry Together
Each geometry (point, line, polygon) in GeoPandas can be associated with attribute data (similar to columns in a DataFrame).
Example:
from shapely.geometry import Point
import geopandas as gpd
# Create a GeoDataFrame with attributes
data = {
'City': ['Delhi', 'Mumbai'],
'Population': [19000000, 20000000],
'geometry': [Point(77.2090, 28.6139), Point(72.8777, 19.0760)]
}
cities = gpd.GeoDataFrame(data, crs="EPSG:4326")
# Filter by attribute
large_cities = cities[cities['Population'] > 19500000]
print(large_cities)
Output

3.3 Plotting GeoDataFrames
import matplotlib.pyplot as plt
# Basic plot
cities.plot()
plt.show()
# Color based on population
cities.plot(column='Population', cmap='OrRd', legend=True)
plt.show()
Output

For more information refer to : Working with Geospatial Data in Python
4. GeoPandas Operations
GeoPandas supports powerful geospatial operations that allow you to analyze, transform and combine spatial datasets. These operations are essential for tasks like urban planning, environmental studies and transportation analysis. to understand these operation Load shapw files.
from shapely.geometry import box
import geopandas as gpd
import matplotlib.pyplot as plt
from shapely.geometry import Point
url = "https://raw.githubusercontent.com/jcanalesluna/bcn-geodata/master/districtes/districtes.geojson"
districts = gpd.read_file(url)
districts.plot(edgecolor="black", figsize=(8, 6))
plt.title("Barcelona Administrative Units")
plt.show()
districts.plot(column="NOM", cmap="tab20", legend=True, figsize=(10, 8), edgecolor="black")
plt.title("Barcelona Districts (by Name)")
plt.show()
Output


Explanation:
- gpd.read_file(url) loads the Barcelona districts GeoJSON.
- .plot() quickly visualizes polygons.
- We use column="NOM" to color districts by name.
4.1 Spatial Joins
Combine two GeoDataFrames based on spatial relationships (e.g., which points fall inside which polygons).
# Casa Batlló landmark
batllo = gpd.GeoDataFrame(
{"Landmark": ["Casa Batlló"]},
geometry=[Point(2.165, 41.3917)],
crs="EPSG:4326"
)
# Spatial join (both to EPSG:3857)
result = gpd.sjoin(batllo.to_crs(3857), districts.to_crs(3857), how="left", predicate="within")
print(result[["Landmark", "ANNEXDESCR"]])
# Plot
ax = districts.plot(column="ANNEXDESCR", cmap="tab20", figsize=(10,8), legend=True)
batllo.plot(ax=ax, color="red", markersize=50)
plt.title("Barcelona Districts with Casa Batlló")
plt.show()
Output
Landmark ANNEXDESCR
0 Casa Batlló Grup - I

Explanation: gpd.sjoin() checks if Casa Batlló falls within a district polygon. The output shows the district group name (ANNEXDESCR). We plot Casa Batlló in red on the district map.
4.2 Buffering
districts_buf = districts_m.copy()
districts_buf["geometry"] = districts_m.buffer(2000)
print(districts_buf[["NOM"]].head())
districts_buf.plot()
plt.title("Buffered Districts (2 km)")
plt.show()
Output


4.3 Clipping
bbox = gpd.GeoDataFrame(
geometry=[box(2.1, 41.37, 2.18, 41.41)],
crs="EPSG:4326"
).to_crs(3857)
clipped = gpd.clip(districts_m, bbox)
print(" ")
print(clipped[["NOM"]].head())
clipped.plot()
plt.title("Clipped Districts (Central Barcelona)")
plt.show()
Output


Explanation: A clip extracts only the part of polygons inside a bounding box.
4.4 Geometry ops
districts_m["area_m2"] = districts_m.area
districts_m["perimeter_m"] = districts_m.length
print(districts_m[["NOM", "area_m2", "perimeter_m"]].head())
batllo = gpd.GeoSeries([Point(2.165, 41.3917)], crs="EPSG:4326").to_crs(3857)
sagrada = gpd.GeoSeries([Point(2.1744, 41.4036)], crs="EPSG:4326").to_crs(3857)
print("Casa Batlló – Sagrada Familia distance (meters):", batllo.distance(sagrada)[0])
Output

Explanation:
- .area and .length compute polygon metrics (in square meters / meters).
- .distance() measures straight‑line distance between two landmarks.
4.5 Overlay
ovr = gpd.overlay(ctr_buf, ctr, how="intersection")
print(ovr.head())
ovr.plot()
Output


Explanation:
- gpd.overlay() combines two layers (buffered vs. original).
- how="intersection" keeps only overlapping areas.
5. Customizing Maps
You can re-size and stylize maps for better insights:
6. Network Analysis with OSMnx
GeoPandas integrates well with OSMnx, allowing conversion between network graphs and spatial data.