Skip to content

vfrank66/lucas-download

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Brazilian Chamber of Deputies PDF Scraper

A Python program to automatically download PDF documents from the Brazilian Chamber of Deputies (Câmara dos Deputados) website.

Setup

1. Create Virtual Environment

python -m venv venv

2. Activate Virtual Environment

On macOS/Linux:

source venv/bin/activate

On Windows:

venv\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

Usage

Basic Usage

python main.py

Custom Options

# Download 5 years back with 30 threads
python main.py 5 30

# Download 1 year back with 20 threads  
python main.py 1 20

Command Line Arguments

  • First argument: Number of years back to download (default: 2)
  • Second argument: Number of concurrent threads (default: 40)

Output

  • Downloads folder: ./downloads/YEAR/MONTH_NAME/
  • Progress file: download_progress.json (tracks completed downloads)
  • Log file: camara_downloader.log

Example Output Structure

downloads/
├── 2023/
│   ├── 01_Janeiro/
│   │   └── DCD0020230101000490000.PDF
│   └── 02_Fevereiro/
└── 2024/
    └── 01_Janeiro/

The program automatically resumes from where it left off if interrupted.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published