Skip to content

⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/subtitles, PDFs, screenshots, article text, git repos, and more...

License

Notifications You must be signed in to change notification settings

ArchiveBox/abx-dl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⬇️ abx-dl

A simple all-in-one CLI tool to auto-detect and download everything available from a URL.

pip install abx-dl
abx-dl 'https://example.com/page/to/download'

✨ Ever wish you could yt-dlp, gallery-dl, wget, curl, puppeteer, etc. all in one command?

abx-dl is an all-in-one CLI tool for downloading URLs "by any means necessary".

It's useful for scraping, downloading, OSINT, digital preservation, and more. abx-dl provides a simpler one-shot CLI interface to the ArchiveBox plugin ecosystem.



🍜 What does it save?

abx-dl --plugins=title,favicon,headers,wget,singlefile,screenshot,pdf,dom,readability,git,... 'https://example.com'

abx-dl runs all plugins by default, or you can specify --plugins=... for specific methods:

  • HTML, JS, CSS, images, etc. rendered with a headless browser
  • title, favicon, headers, outlinks, and other metadata
  • audio, video, subtitles, playlists, comments
  • snapshot of the page as a PDF, screenshot, and Singlefile HTML
  • article text, git source code
  • and much more...

🧩 How does it work?

abx-dl uses the ABX Plugin Library (shared with ArchiveBox) to run a collection of downloading and scraping tools.

Plugins are discovered from the plugins/ directory and execute hooks in order:

  1. Crawl hooks run first (setup/install dependencies like Chrome)
  2. Snapshot hooks run per-URL to extract content

Each plugin can output:

  • Files to its output directory
  • JSONL records for status reporting
  • Config updates that propagate to subsequent plugins

βš™οΈ Configuration

Configuration is handled via environment variables or persistent config file (~/.config/abx/config.env):

abx-dl config                        # show all config (global + per-plugin)
abx-dl config --get WGET_TIMEOUT     # get a specific value
abx-dl config --set TIMEOUT=120      # set persistently (resolves aliases)

Output is grouped by section:

# GLOBAL
TIMEOUT=60
USER_AGENT="Mozilla/5.0 ..."
...

# plugins/wget
WGET_BINARY="wget"
WGET_TIMEOUT=60
...

# plugins/chrome
CHROME_BINARY="chromium"
...

Common options:

  • TIMEOUT=60 - default timeout for hooks
  • USER_AGENT - default user agent string
  • {PLUGIN}_BINARY - path to plugin's binary (e.g. WGET_BINARY, CHROME_BINARY)
  • {PLUGIN}_ENABLED=true/false - enable/disable specific plugins
  • {PLUGIN}_TIMEOUT=120 - per-plugin timeout overrides

Aliases are automatically resolved (e.g. --set USE_WGET=false saves as WGET_ENABLED=false).




πŸ“¦ Install

pip install abx-dl
abx-dl plugins --install   # optional: pre-install plugin dependencies

πŸ”  Usage

# Basic usage - download URL with all plugins:
abx-dl 'https://example.com'

# Download with specific plugins only:
abx-dl --plugins=wget,ytdlp,git,screenshot 'https://example.com'

# Skip auto-installing missing dependencies (emit warnings instead):
abx-dl --no-install 'https://example.com'

# Specify output directory:
abx-dl --output=./downloads 'https://example.com'

# Set timeout:
abx-dl --timeout=120 'https://example.com'

Commands

abx-dl <url>                              # Download URL (default command)
abx-dl plugins                            # Check + show info for all plugins
abx-dl plugins wget ytdlp git             # Check + show info for specific plugins
abx-dl plugins --install                  # Install all plugin dependencies
abx-dl plugins --install wget ytdlp git   # Install specific plugin dependencies
abx-dl config                             # Show all config values
abx-dl config --get TIMEOUT               # Get a specific config value
abx-dl config --set TIMEOUT=120           # Set a config value persistently

Installing Dependencies

Many plugins require external binaries (e.g., wget, chrome, yt-dlp, single-file).

By default, abx-dl lazily auto-installs missing dependencies as needed when you download a URL. Use --no-install to skip plugins with missing dependencies instead:

# Auto-installs missing deps on-the-fly (default behavior)
abx-dl 'https://example.com'

# Skip plugins with missing deps, emit warnings instead
abx-dl --no-install 'https://example.com'

# Pre-install all plugin dependencies
abx-dl plugins --install

# Install dependencies for specific plugins only
abx-dl plugins --install wget singlefile ytdlp

# Check which dependencies are available/missing
abx-dl plugins

Dependencies are installed to ~/.config/abx/lib/{arch}/ using the appropriate package manager:

  • pip packages β†’ ~/.config/abx/lib/{arch}/pip/venv/
  • npm packages β†’ ~/.config/abx/lib/{arch}/npm/
  • brew/apt packages β†’ system locations

You can override the install location with LIB_DIR=/path/to/lib abx-dl install.




Output Structure

./
β”œβ”€β”€ index.jsonl             # Snapshot metadata and results (JSONL format)
β”œβ”€β”€ title/
β”‚   └── title.txt
β”œβ”€β”€ favicon/
β”‚   └── favicon.ico
β”œβ”€β”€ screenshot/
β”‚   └── screenshot.png
β”œβ”€β”€ pdf/
β”‚   └── output.pdf
β”œβ”€β”€ dom/
β”‚   └── output.html
β”œβ”€β”€ wget/
β”‚   └── example.com/
β”‚       └── index.html
β”œβ”€β”€ singlefile/
β”‚   └── output.html
└── ...

All Outputs

  • index.jsonl - snapshot metadata and plugin results (JSONL format, ArchiveBox-compatible)
  • title/title.txt - page title
  • favicon/favicon.ico - site favicon
  • screenshot/screenshot.png - full page screenshot (Chrome)
  • pdf/output.pdf - page as PDF (Chrome)
  • dom/output.html - rendered DOM (Chrome)
  • wget/example.com/... - mirrored site files
  • singlefile/output.html - single-file HTML snapshot
  • ... and more via plugin library ...

Architecture

abx-dl is built on these components:

  • abx_dl/plugins.py - Plugin discovery from plugins/ directory
  • abx_dl/executor.py - Hook execution engine with config propagation
  • abx_dl/config.py - Environment variable configuration
  • abx_dl/cli.py - Rich CLI with live progress display

Plugins are symlinked from ArchiveBox's plugin directory.


For more advanced use with collections, parallel downloading, a Web UI + REST API, etc. See: ArchiveBox/ArchiveBox

About

⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/subtitles, PDFs, screenshots, article text, git repos, and more...

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

  •  
  •  

Packages

No packages published

Languages