Skip to content

Advanced PDF processing, OCR, and AI vision analysis nodes for ComfyUI. Extract images from PDFs, perform multilingual OCR with Surya, detect objects with Florence-2, and analyze document layouts.

License

Notifications You must be signed in to change notification settings

EricRollei/PDF-Tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF Tools - ComfyUI Custom Node Package

Advanced PDF processing, OCR, and AI vision analysis nodes for ComfyUI.

πŸ“’ Important Notice: Package Split

The download functionality has moved to a separate package:

  • PDF Tools (this package): PDF extraction, OCR, AI vision processing
  • Download Tools (new package): gallery-dl and yt-dlp downloaders

If you need media download nodes, install the download-tools package separately:

cd ComfyUI/custom_nodes/download-tools
.\install.ps1

πŸŽ‰ Quick Start

Installation

cd ComfyUI/custom_nodes/PDF_tools
.\install.ps1

Verify Installation

.\check_install.ps1

Start Using

  1. Restart ComfyUI
  2. Look for nodes under categories: PDF, OCR, Vision, Layout
  3. Start processing documents!

πŸ“¦ Available Nodes

PDF Extraction

  • PDF Extractor v08/v09 - Advanced image extraction with quality assessment

    • Automatic spread detection for scanned books
    • Image quality scoring (sharpness, contrast, brightness)
    • Duplicate detection
    • Organize output by quality
    • JSON metadata export
  • Simple PDF Extractor - Basic extraction without advanced features

OCR (Optical Character Recognition)

  • Surya OCR Layout Node - State-of-the-art multilingual OCR

    • 90+ languages supported
    • Layout-aware text extraction
    • High accuracy on complex documents
    • GPU-accelerated inference
  • Surya Layout OCR Hybrid - Combined layout analysis + OCR

    • Single-step document processing
    • Preserves reading order
    • Handles multi-column layouts
  • PaddleOCR VL Remote - Specialized for Chinese/CJK documents

    • Excellent for Asian language texts
    • Remote processing capabilities
    • Requires separate virtual environment (see PaddleOCR_VL_SETUP.md)
    • Runs as standalone service due to CUDA version conflicts

Layout Analysis

  • Enhanced Layout Parser v06 - Advanced document understanding

    • Detects titles, paragraphs, tables, figures, lists
    • Hierarchical structure extraction
    • Reading order detection
    • Bounding box coordinates
  • LayoutLMv3 Node - Microsoft's document AI model

    • Multi-modal document understanding
    • Form and receipt processing
    • Table structure recognition

AI Vision & Object Detection

  • Florence2 Rectangle Detector - Microsoft Florence-2 vision model

    • Object detection with bounding boxes
    • Image captioning (simple & detailed)
    • Visual question answering
    • OCR and text detection
    • Region-specific descriptions
  • Florence2 Cropper Node - Crop based on detections

    • Automatic image region extraction
    • Batch processing of detected objects

πŸš€ Key Features

βœ… Smart PDF Extraction - Quality scoring, spread detection, duplicate removal
βœ… Multilingual OCR - 90+ languages with Surya, Chinese/Japanese with PaddleOCR
βœ… Layout Understanding - Detect document structure (titles, paragraphs, tables)
βœ… AI Vision Models - Florence-2 for object detection and image analysis
βœ… Batch Processing - Process multiple documents efficiently
βœ… GPU Acceleration - Fast inference with CUDA support
βœ… Quality Assessment - Automatic image quality evaluation
βœ… JSON Export - Structured metadata for all extractions

πŸ’‘ Usage Examples

Extract High-Quality Images from PDF

Node: PDF Extractor v08
β”œβ”€β”€ Input PDF: "mybook.pdf"
β”œβ”€β”€ Output Folder: "./extracted_images"
β”œβ”€β”€ Options:
β”‚   β”œβ”€β”€ βœ“ quality_assessment (score each image)
β”‚   β”œβ”€β”€ βœ“ spread_detection (detect 2-page spreads)
β”‚   β”œβ”€β”€ βœ“ organize_by_quality (high/medium/low folders)
β”‚   └── βœ“ save_json_output (metadata file)
└── Result: Images sorted by quality with detailed metrics

OCR a Scanned Document

Node: Surya OCR Layout Node
β”œβ”€β”€ Input: "scanned_page.png"
β”œβ”€β”€ Languages: ["en"] or ["en", "es", "fr"]
β”œβ”€β”€ Output:
β”‚   β”œβ”€β”€ Extracted text with 95%+ accuracy
β”‚   β”œβ”€β”€ Bounding boxes for each word/line
β”‚   └── Layout information (columns, paragraphs)

Detect Objects in Images

Node: Florence2 Rectangle Detector
β”œβ”€β”€ Input Image: "photo.jpg"
β”œβ”€β”€ Task: <OD> (Object Detection)
β”œβ”€β”€ Output:
β”‚   β”œβ”€β”€ Bounding boxes for detected objects
β”‚   β”œβ”€β”€ Labels (e.g., "person", "car", "dog")
β”‚   └── Confidence scores

Analyze Document Layout

Node: Enhanced Layout Parser v06
β”œβ”€β”€ Input: PDF page or image
β”œβ”€β”€ Output:
β”‚   β”œβ”€β”€ Regions: title, text, table, figure, list
β”‚   β”œβ”€β”€ Bounding box coordinates
β”‚   β”œβ”€β”€ Hierarchical structure
β”‚   └── Reading order

πŸ”§ System Requirements

  • OS: Windows 10/11 (primary), Linux compatible
  • Python: 3.10+ (included with ComfyUI)
  • GPU: NVIDIA with CUDA recommended (CPU works but slower)
  • RAM: 8GB minimum, 16GB+ recommended for AI models
  • Storage: 5-10GB for packages + models

οΏ½οΏ½οΏ½ Documentation

Main Guides

Additional Docs

πŸ”§ Core Dependencies

Auto-installed with install.ps1:

  • PyMuPDF (fitz) - PDF processing and rendering
  • Pillow - Image processing and manipulation
  • numpy - Array operations and numerical computing
  • opencv-python - Computer vision operations
  • transformers - Hugging Face AI models
  • torch - PyTorch for deep learning
  • surya-ocr - Advanced OCR engine
  • paddleocr - Chinese/multilingual OCR (basic version)
  • layoutparser - Document layout analysis

Note: PaddleOCR VL requires a separate virtual environment due to CUDA version conflicts. See PaddleOCR_VL_SETUP.md for setup instructions.

See requirements.txt for complete list.

πŸ“ Project Structure

PDF_tools/
β”œβ”€β”€ nodes/              # ComfyUI node implementations
β”‚   β”œβ”€β”€ pdf_extractor_v08.py      # Advanced PDF extraction
β”‚   β”œβ”€β”€ surya_ocr_layout_node.py  # Surya OCR
β”‚   β”œβ”€β”€ eric-florence2-cropper-node.py  # Florence-2 vision
β”‚   └── enhanced_layout_parser_v06.py   # Layout analysis
β”œβ”€β”€ florence2_scripts/  # Florence-2 AI vision models
β”œβ”€β”€ sam2_scripts/       # SAM2 segmentation models
β”œβ”€β”€ tools/              # Utility scripts
β”œβ”€β”€ Docs/               # Comprehensive documentation
└── __init__.py         # Node registration

πŸ› Troubleshooting

"Module not found" errors

Run the check script: .\check_install.ps1

"CUDA out of memory"

  • Close other GPU applications
  • Process fewer pages at once
  • Use CPU mode (slower but works)

OCR accuracy issues

  • Ensure image is high resolution (300+ DPI)
  • Check language settings match document
  • Try different OCR nodes for comparison

PDF extraction produces no images

  • Verify PDF contains raster images (not just text)
  • Check PDF isn't encrypted or password-protected
  • Try Simple PDF Extractor for troubleshooting

See INSTALLATION_GUIDE.md for more troubleshooting.

🎯 Best Practices

  1. High-Quality Inputs - Use 300+ DPI scans for best OCR results
  2. Enable Quality Assessment - Let the tool filter low-quality extractions
  3. Batch Process - Process multiple documents in one workflow
  4. Export Metadata - Save JSON outputs for downstream processing
  5. GPU Acceleration - Use CUDA for 10x faster inference with AI models

πŸ“ Version Info

Current versions:

  • PyMuPDF: 1.26.4+
  • Transformers: 4.55.0+
  • Torch: 2.7.1+cu128
  • Surya-OCR: Latest from GitHub
  • Florence-2: Microsoft Research

πŸ“„ License

Copyright (c) 2025 Eric Hiss. All rights reserved.

Dual-licensed:

Important: This project uses third-party libraries with various licenses (GPL, AGPL, MIT, Apache). See CREDITS.md for complete dependency licensing.

🀝 Contributing

Contributions welcome! See CONTRIBUTING.md for:

  • Code style guidelines
  • Testing requirements
  • Pull request process
  • Development setup

πŸ‘₯ Contact & Support

πŸ™ Acknowledgments

Special thanks to:

  • ComfyUI community for the amazing extensible platform
  • Microsoft Research for Florence-2 vision models
  • Vikp for Surya OCR
  • Meta AI for SAM2 segmentation models
  • Hugging Face for model hosting and transformers library
  • All open-source developers whose work makes this possible

See CREDITS.md for detailed acknowledgments.


Ready to process documents! Install dependencies, restart ComfyUI, and start extracting.

About

Advanced PDF processing, OCR, and AI vision analysis nodes for ComfyUI. Extract images from PDFs, perform multilingual OCR with Surya, detect objects with Florence-2, and analyze document layouts.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published