I’m building a data ingestion pipeline in Python that collects data from a third-party REST API. The API allows a maximum of 100 requests per minute, and I need to fetch data for tens of thousands of items.
Here’s a simplified version of my current approach using asyncio and aiohttp:
import asyncio
import aiohttp
async def fetch(session, url):
async with session.get(url) as resp:
return await resp.json()
async def main(urls):
async with aiohttp.ClientSession() as session:
tasks = [fetch(session, url) for url in urls]
results = await asyncio.gather(*tasks)
return results
urls = [f"https://api.example.com/items/{i}" for i in range(10000)]
data = asyncio.run(main(urls))
This works for small sets of URLs but fails at scale — I quickly exceed the rate limit and start getting HTTP 429 errors.
I’ve tried introducing semaphores and sleep intervals:
semaphore = asyncio.Semaphore(10)
async def fetch_limited(session, url):
async with semaphore:
async with session.get(url) as resp:
if resp.status == 429:
await asyncio.sleep(60)
return await fetch_limited(session, url)
return await resp.json()
However:
It’s inefficient — sleeps block all tasks instead of just the rate-limited ones.
I still occasionally hit bursts of 429s, likely due to concurrency scheduling.
Retries are inconsistent and can cause starvation of certain tasks.
Question: What’s the most efficient and Pythonic way to:
Parallelize a large number of API calls asynchronously
Respect strict rate limits (e.g., 100 requests per minute)
Handle retries and exponential backoff cleanly
Avoid blocking the event loop when rate-limited
Would using libraries like aiolimiter, tenacity, or an asyncio.Queue architecture be better suited?
I’m looking for a robust design pattern or example that scales gracefully without hitting rate limits.
Retry-Afterheader if one is present. You don't want to waste resources making requests that you know will fail, whether or not such requests will impact the quota.requestsorurllib3with a Retry strategy to make sure that you "back off" in case of any HTTP 429 responses