Tired of 403s and blank pages when scraping JavaScript-heavy websites?
You're not alone — and that's exactly why I built ScrapeSome.
🚀 What Is ScrapeSome?
ScrapeSome is a developer-friendly Python library that makes scraping modern websites simple — even the ones loaded with dynamic JavaScript or tough anti-bot protections.
It’s fast, lightweight, and requires zero boilerplate.
🔧 Why I Built It
I kept hitting walls on scraping projects:
- Pages rendered everything with JavaScript
- APIs were locked down or undocumented
-
requests
/Scrapy
failed or got 403 error - Setting up full browser automation felt too heavy for small jobs
So I built ScrapeSome — to fill the gap between requests
and full-on headless scraping frameworks.
💡 Why Use ScrapeSome?
- Handles both static and JS-heavy pages out of the box
- Supports both sync and async scraping
- Converts raw
HTML
into cleantext
,JSON
, orMarkdown
- Works with minimal configuration (
pip install scrapesome
) - Handles
timeouts
,retries
,redirects
,user agents
🚀 Features
- 🔁 Sync + Async scraping support
- 🔄 Automatic retries and intelligent fallbacks
- 🧪 Playwright rendering fallback for JS-heavy pages
- 📝 Format responses as raw HTML, plain text, Markdown, or structured JSON
- ⚙️ Configurable: timeouts, redirects, user agents, and logging
- 🧪 Test coverage with
pytest
andpytest-asyncio
⚖ Comparison with Alternatives
Feature | ScrapeSome ✅ | Scrapy | Selenium/UC | Playwright (Raw) |
---|---|---|---|---|
✅ Sync + Async Scraping | ✅ Built-in | ❌ Async only* | ❌ Manual | ❌ Manual |
🧠 JS Rendering (Fallback) | ✅ Seamless | ❌ Plugin setup | ✅ Full | ✅ Full |
📝 Output as JSON/Markdown/HTML | ✅ Built-in | ❌ Requires custom | ❌ Manual parsing | ❌ Manual parsing |
🔁 Retry & Timeout Handling | ✅ Built-in | ⚠️ Requires config | ❌ Manual | ❌ Manual |
⚡ Minimal Setup (Boilerplate) | ✅ Near zero | ❌ Needs project | ❌ Driver setup | ❌ Browser install |
🧪 Testable out-of-the-box | ✅ Pytest-ready | ⚠️ Complex | ❌ | ❌ |
🛠️ Config via .env or inline | ✅ Simple | ⚠️ Complex | ❌ | ❌ |
📦 Install & Run in <1 Min | ✅ Yes | ❌ | ❌ | ❌ |
📦 Installation
pip install scrapesome
Playwright Setup
ScrapeSome uses Playwright for JavaScript rendering fallback. To enable this, you need to install Playwright and its dependencies.
1. Install Playwright Python package if not installed
pip install playwright
2. Install Playwright browsers
playwright install
3. Install system dependencies
Playwright requires some system libraries to run browsers, which vary by operating system.
For Windows
Playwright installs everything you need automatically with playwright install, so no additional setup is usually required.
For Linux (Ubuntu/Debian)
Run the following command to install required system libraries:
playwright install-deps
If you don't have playwright CLI available, you can install dependencies manually:
sudo apt-get update
sudo apt-get install -y libwoff1 libopus0 libwebp6 libharfbuzz-icu0 libwebpmux3 \
libenchant-2-2 libhyphen0 libegl1 libglx0 libgudev-1.0-0 \
libevdev2 libgles2 libx264-160
Note: Package names may vary depending on your distribution and version.
For macOS
You can install required libraries using Homebrew:
brew install harfbuzz enchant
After this setup, you should be able to use ScrapeSome with full Playwright rendering support!
⚡ Quick Start
Synchronous Example
from scrapesome import sync_scraper
html = sync_scraper("https://example.com")
html
Asynchronous Example
import asyncio
from scrapesome import async_scraper
html = asyncio.run(async_scraper("https://example.com"))
html
🖥️ CLI Usage
ScrapeSome also includes a powerful CLI for quick and easy scraping from the command line.
📦 Installation with CLI Support
To use the CLI, install with the optional cli
extras:
pip install scrapesome[cli]
🔧 Basic Usage
scrapesome scrape --url https://example.com
This performs a synchronous scrape and outputs plain text by default.
⚙️ Available Options
Option | Description | Default |
---|---|---|
--async-mode |
Use asynchronous scraping | False |
--force-playwright |
Force JavaScript rendering using Playwright | False |
--output-format |
Choose text , json , markdown , or html
|
html |
Examples
Basic scrape
scrapesome scrape --url https://example.com
Force Playwright rendering
scrapesome scrape --url https://example.com --force-playwright
Get JSON output
scrapesome scrape --url https://example.com --output-format json
Async scrape with markdown output
scrapesome scrape --url https://example.com --async-mode --output-format markdown
🧪 Try it out on PyPI:
👉 https://pypi.org/project/scrapesome/
🔗 Links
- 🔧 GitHub: github.com/scrapesome/scrapesome
- 📚 Docs: scrapesome.onrender.com
- 📄 Full blog post: Medium
🙌 Feedback Welcome
This is an early release, and I’d love to hear your thoughts.
Try it, break it, file issues, suggest features — or just ⭐ the repo if you like the idea!
Happy scraping! 🕷️
— Vishnu
Top comments (0)