Open In App

GeoPandas Tutorial

Last Updated : 29 Aug, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

GeoPandas is an open-source Python library that makes working with geospatial data easy. It extends pandas to support geometric data types and operations, enabling spatial analysis and visualization directly in Python. Commonly used in GIS, data science and environmental analytics, GeoPandas supports file formats like Shapefile and GeoJSON and integrates well with tools like Matplotlib, Folium and Plotly.

Important Facts to Know:

  • Built on pandas: Adds geometry support (Points, Lines, Polygons) to DataFrames.
  • Simple Mapping: Use .plot() for quick geospatial visualizations.
  • Spatial Operations: Perform joins, overlays, projections and more.
  • Wide Format Support: Reads Shapefiles, GeoJSON, KML and other geospatial formats.

This tutorial takes you from GeoPandas basics to advanced use, with practical examples using real-world datasets.

1. How to Install GeoPandas

Installing GeoPandas can vary depending on your system. Here's a guide for different platforms:

2. Basic Operations with GeoPandas

In this section, we’ll cover the basic operations you can perform using GeoPandas, while introducing key geospatial concepts such as spatial data types, file formats and coordinate reference systems (CRS).

2.1 Reading and Writing Spatial Data

GeoPandas can read and write various spatial formats. It reads data into a GeoDataFrame using gpd.read_file().

Python
import geopandas as gpd

# Load Barcelona districts dataset from GitHub
url = "https://raw.githubusercontent.com/jcanalesluna/bcn-geodata/master/districtes/districtes.geojson"
districts = gpd.read_file(url)

print(districts.head())
print("CRS:", districts.crs)

Output

Output
Sample Data Preview

2.2 Types of Spatial Data

1. Vector Data (Points, Lines, Polygons)

GeoPandas supports creation of these geometries using shapely (which it integrates internally):

Python
import geopandas as gpd
from shapely.geometry import Point, LineString, Polygon
import matplotlib.pyplot as plt

point = Point(77.2090, 28.6139)  # Longitude, Latitude
line = LineString([(0, 0), (1, 2), (2, 4)])

polygon = Polygon([(0, 0), (1, 1), (1, 0)])

gdf = gpd.GeoDataFrame({
    'name': ['Delhi', 'Route1', 'Area1'],
    'geometry': [point, line, polygon]
})
print(gdf)

gdf.plot()
plt.show()

Output

Output
Geometry DataFrame
Output
Plotted Geometries

Explanation:

  • gpd.read_file("your_file.shp") loads a Shapefile (.shp) into GeoPandas.
  • read_file() function automatically detects spatial formats and returns a GeoDataFrame, which is like a pandas DataFrame but with a geometry column.
  • print(gdf.head()) prints the first 5 rows of the GeoDataFrame to quickly check attributes and geometry.

2. Raster Data (Grid-Based)

GeoPandas doesn’t handle raster directly, but you can use rasterio for raster operations:

Python
import rasterio
with rasterio.open(r"C:\Users\visha\OneDrive\Desktop\Python\RGB.byte.tif") as src:
    
    print("Raster Profile:")
    print(src.profile)

Output

Output
Raster Info

2.3 Common Spatial Data Formats

GeoPandas supports several vector data formats. The most common ones are Shapefile and GeoJSON. These formats are used for storing spatial data and are widely supported in both GIS tools and programming workflows.

1. Reading a Shapefile

A Shapefile is not just a single file but a collection of related files:

Required Files in a Shapefile:

  • countries.shp : Contains geometric data (points, lines, polygons)
  • countries.shx : Index for geometries
  • countries.dbf : Attribute data in tabular format
  • countries.prj (optional) : Coordinate system (CRS)

Example Folder Structure:

data/

├── countries.shp

├── countries.shx

├── countries.dbf

└── countries.prj

Example:

Python
import geopandas as gpd

# Reading a shapefile (requires .shp, .shx, .dbf to be in same folder)
gdf = gpd.read_file("data/countries.shp")

2. Reading a GeoJSON File

GeoJSON is a popular, web-friendly format that stores both geometry and attribute data in a single .geojson file.

Example File:

data/

└── countries.geojson

Example:

Python
import geopandas as gpd

# Reading a GeoJSON file
gdf = gpd.read_file("data/countries.geojson")

3. Writing to GeoJSON or Shapefile

GeoPandas allows exporting spatial data to both formats using the to_file() method.

1. Export to GeoJSON:

Python
# Exporting GeoDataFrame to a GeoJSON file
gdf.to_file("output/output.geojson", driver="GeoJSON")

Output

output/

└── output.geojson

2. Export to Shapefile:

Python
# Exporting GeoDataFrame to a Shapefile
gdf.to_file("output/output_shapefile.shp")

Output

output/

├── output_shapefile.shp

├── output_shapefile.shx

├── output_shapefile.dbf

└── output_shapefile.prj

3.1 Coordinate Reference Systems (CRS)

A Coordinate Reference System (CRS) defines how spatial data is projected onto the Earth’s surface. GeoPandas uses EPSG codes (standard CRS identifiers) to handle CRS transformations.

1. Checking the Current CRS

# Check current CRS of a GeoDataFrame

print(gdf.crs)

2. Reprojecting to Another CRS

# Reproject to Web Mercator (EPSG:3857)

gdf = gdf.to_crs(epsg=3857)

3. Setting CRS Manually (if missing)

# If CRS is undefined or missing

gdf.set_crs(epsg=4326, inplace=True)

3.2 Attributes and Geometry Together

Each geometry (point, line, polygon) in GeoPandas can be associated with attribute data (similar to columns in a DataFrame).

Example:

Python
from shapely.geometry import Point
import geopandas as gpd

# Create a GeoDataFrame with attributes
data = {
    'City': ['Delhi', 'Mumbai'],
    'Population': [19000000, 20000000],
    'geometry': [Point(77.2090, 28.6139), Point(72.8777, 19.0760)]
}
cities = gpd.GeoDataFrame(data, crs="EPSG:4326")

# Filter by attribute
large_cities = cities[cities['Population'] > 19500000]

print(large_cities)

Output

Output
Filtered Cities

3.3 Plotting GeoDataFrames

Python
import matplotlib.pyplot as plt

# Basic plot
cities.plot()
plt.show()

# Color based on population
cities.plot(column='Population', cmap='OrRd', legend=True)
plt.show()

Output

Output
City Maps

For more information refer to : Working with Geospatial Data in Python

4. GeoPandas Operations

GeoPandas supports powerful geospatial operations that allow you to analyze, transform and combine spatial datasets. These operations are essential for tasks like urban planning, environmental studies and transportation analysis. to understand these operation Load shapw files.

Python
from shapely.geometry import box
import geopandas as gpd
import matplotlib.pyplot as plt
from shapely.geometry import Point

url = "https://raw.githubusercontent.com/jcanalesluna/bcn-geodata/master/districtes/districtes.geojson"
districts = gpd.read_file(url)

districts.plot(edgecolor="black", figsize=(8, 6))
plt.title("Barcelona Administrative Units")
plt.show()

districts.plot(column="NOM", cmap="tab20", legend=True, figsize=(10, 8), edgecolor="black")
plt.title("Barcelona Districts (by Name)")
plt.show()

Output

Output
Barcelona Administrative units
Output
Barcelona districts

Explanation:

  • gpd.read_file(url) loads the Barcelona districts GeoJSON.
  • .plot() quickly visualizes polygons.
  • We use column="NOM" to color districts by name.

4.1 Spatial Joins

Combine two GeoDataFrames based on spatial relationships (e.g., which points fall inside which polygons).

Python
# Casa Batlló landmark
batllo = gpd.GeoDataFrame(
    {"Landmark": ["Casa Batlló"]},
    geometry=[Point(2.165, 41.3917)],
    crs="EPSG:4326"
)

# Spatial join (both to EPSG:3857)
result = gpd.sjoin(batllo.to_crs(3857), districts.to_crs(3857), how="left", predicate="within")
print(result[["Landmark", "ANNEXDESCR"]])

# Plot
ax = districts.plot(column="ANNEXDESCR", cmap="tab20", figsize=(10,8), legend=True)
batllo.plot(ax=ax, color="red", markersize=50)
plt.title("Barcelona Districts with Casa Batlló")
plt.show()

Output

Landmark ANNEXDESCR

0 Casa Batlló Grup - I

Output
Spatial Join Result

Explanation: gpd.sjoin() checks if Casa Batlló falls within a district polygon. The output shows the district group name (ANNEXDESCR). We plot Casa Batlló in red on the district map.

4.2 Buffering

Python
districts_buf = districts_m.copy()
districts_buf["geometry"] = districts_m.buffer(2000)
print(districts_buf[["NOM"]].head())
districts_buf.plot()
plt.title("Buffered Districts (2 km)")
plt.show()

Output

Output
Buffered data
Output
Buffered Map

4.3 Clipping

Python
bbox = gpd.GeoDataFrame(
    geometry=[box(2.1, 41.37, 2.18, 41.41)],
    crs="EPSG:4326"
).to_crs(3857)

clipped = gpd.clip(districts_m, bbox)
print(" ")
print(clipped[["NOM"]].head())
clipped.plot()
plt.title("Clipped Districts (Central Barcelona)")
plt.show()

Output

Output
Clipped data
Output
Clipped Map

Explanation: A clip extracts only the part of polygons inside a bounding box.

4.4 Geometry ops

Python
districts_m["area_m2"] = districts_m.area
districts_m["perimeter_m"] = districts_m.length
print(districts_m[["NOM", "area_m2", "perimeter_m"]].head())

batllo = gpd.GeoSeries([Point(2.165, 41.3917)], crs="EPSG:4326").to_crs(3857)
sagrada = gpd.GeoSeries([Point(2.1744, 41.4036)], crs="EPSG:4326").to_crs(3857)
print("Casa Batlló – Sagrada Familia distance (meters):", batllo.distance(sagrada)[0])

Output

Output
Area & Distance

Explanation:

  • .area and .length compute polygon metrics (in square meters / meters).
  • .distance() measures straight‑line distance between two landmarks.

4.5 Overlay

Python
ovr = gpd.overlay(ctr_buf, ctr, how="intersection")
print(ovr.head())
ovr.plot()

Output

Output
Overlay data
Output
Overlay Map

Explanation:

  • gpd.overlay() combines two layers (buffered vs. original).
  • how="intersection" keeps only overlapping areas.

5. Customizing Maps

You can re-size and stylize maps for better insights:

6. Network Analysis with OSMnx

GeoPandas integrates well with OSMnx, allowing conversion between network graphs and spatial data.


Article Tags :

Explore