Advanced PDF processing, OCR, and AI vision analysis nodes for ComfyUI.
The download functionality has moved to a separate package:
- PDF Tools (this package): PDF extraction, OCR, AI vision processing
- Download Tools (new package): gallery-dl and yt-dlp downloaders
If you need media download nodes, install the download-tools package separately:
cd ComfyUI/custom_nodes/download-tools
.\install.ps1cd ComfyUI/custom_nodes/PDF_tools
.\install.ps1.\check_install.ps1- Restart ComfyUI
- Look for nodes under categories: PDF, OCR, Vision, Layout
- Start processing documents!
-
PDF Extractor v08/v09 - Advanced image extraction with quality assessment
- Automatic spread detection for scanned books
- Image quality scoring (sharpness, contrast, brightness)
- Duplicate detection
- Organize output by quality
- JSON metadata export
-
Simple PDF Extractor - Basic extraction without advanced features
-
Surya OCR Layout Node - State-of-the-art multilingual OCR
- 90+ languages supported
- Layout-aware text extraction
- High accuracy on complex documents
- GPU-accelerated inference
-
Surya Layout OCR Hybrid - Combined layout analysis + OCR
- Single-step document processing
- Preserves reading order
- Handles multi-column layouts
-
PaddleOCR VL Remote - Specialized for Chinese/CJK documents
- Excellent for Asian language texts
- Remote processing capabilities
- Requires separate virtual environment (see PaddleOCR_VL_SETUP.md)
- Runs as standalone service due to CUDA version conflicts
-
Enhanced Layout Parser v06 - Advanced document understanding
- Detects titles, paragraphs, tables, figures, lists
- Hierarchical structure extraction
- Reading order detection
- Bounding box coordinates
-
LayoutLMv3 Node - Microsoft's document AI model
- Multi-modal document understanding
- Form and receipt processing
- Table structure recognition
-
Florence2 Rectangle Detector - Microsoft Florence-2 vision model
- Object detection with bounding boxes
- Image captioning (simple & detailed)
- Visual question answering
- OCR and text detection
- Region-specific descriptions
-
Florence2 Cropper Node - Crop based on detections
- Automatic image region extraction
- Batch processing of detected objects
β
Smart PDF Extraction - Quality scoring, spread detection, duplicate removal
β
Multilingual OCR - 90+ languages with Surya, Chinese/Japanese with PaddleOCR
β
Layout Understanding - Detect document structure (titles, paragraphs, tables)
β
AI Vision Models - Florence-2 for object detection and image analysis
β
Batch Processing - Process multiple documents efficiently
β
GPU Acceleration - Fast inference with CUDA support
β
Quality Assessment - Automatic image quality evaluation
β
JSON Export - Structured metadata for all extractions
Node: PDF Extractor v08
βββ Input PDF: "mybook.pdf"
βββ Output Folder: "./extracted_images"
βββ Options:
β βββ β quality_assessment (score each image)
β βββ β spread_detection (detect 2-page spreads)
β βββ β organize_by_quality (high/medium/low folders)
β βββ β save_json_output (metadata file)
βββ Result: Images sorted by quality with detailed metrics
Node: Surya OCR Layout Node
βββ Input: "scanned_page.png"
βββ Languages: ["en"] or ["en", "es", "fr"]
βββ Output:
β βββ Extracted text with 95%+ accuracy
β βββ Bounding boxes for each word/line
β βββ Layout information (columns, paragraphs)
Node: Florence2 Rectangle Detector
βββ Input Image: "photo.jpg"
βββ Task: <OD> (Object Detection)
βββ Output:
β βββ Bounding boxes for detected objects
β βββ Labels (e.g., "person", "car", "dog")
β βββ Confidence scores
Node: Enhanced Layout Parser v06
βββ Input: PDF page or image
βββ Output:
β βββ Regions: title, text, table, figure, list
β βββ Bounding box coordinates
β βββ Hierarchical structure
β βββ Reading order
- OS: Windows 10/11 (primary), Linux compatible
- Python: 3.10+ (included with ComfyUI)
- GPU: NVIDIA with CUDA recommended (CPU works but slower)
- RAM: 8GB minimum, 16GB+ recommended for AI models
- Storage: 5-10GB for packages + models
- INSTALLATION_GUIDE.md - Detailed setup instructions
- CODE_OVERVIEW.md - Understand the codebase structure
- LICENSE.md - Licensing terms and conditions
- CREDITS.md - Third-party libraries and acknowledgments
- SURYA_OCR_NODE_GUIDE.md - Surya OCR detailed guide
- PaddleOCR_VL_SETUP.md - PaddleOCR separate environment setup
- PDF_LAYER_DETECTION_GUIDE.md - PDF layer analysis
- BATCH_PROCESSING_GUIDE.md - Batch workflow tips
Auto-installed with install.ps1:
- PyMuPDF (fitz) - PDF processing and rendering
- Pillow - Image processing and manipulation
- numpy - Array operations and numerical computing
- opencv-python - Computer vision operations
- transformers - Hugging Face AI models
- torch - PyTorch for deep learning
- surya-ocr - Advanced OCR engine
- paddleocr - Chinese/multilingual OCR (basic version)
- layoutparser - Document layout analysis
Note: PaddleOCR VL requires a separate virtual environment due to CUDA version conflicts. See PaddleOCR_VL_SETUP.md for setup instructions.
See requirements.txt for complete list.
PDF_tools/
βββ nodes/ # ComfyUI node implementations
β βββ pdf_extractor_v08.py # Advanced PDF extraction
β βββ surya_ocr_layout_node.py # Surya OCR
β βββ eric-florence2-cropper-node.py # Florence-2 vision
β βββ enhanced_layout_parser_v06.py # Layout analysis
βββ florence2_scripts/ # Florence-2 AI vision models
βββ sam2_scripts/ # SAM2 segmentation models
βββ tools/ # Utility scripts
βββ Docs/ # Comprehensive documentation
βββ __init__.py # Node registration
Run the check script: .\check_install.ps1
- Close other GPU applications
- Process fewer pages at once
- Use CPU mode (slower but works)
- Ensure image is high resolution (300+ DPI)
- Check language settings match document
- Try different OCR nodes for comparison
- Verify PDF contains raster images (not just text)
- Check PDF isn't encrypted or password-protected
- Try Simple PDF Extractor for troubleshooting
See INSTALLATION_GUIDE.md for more troubleshooting.
- High-Quality Inputs - Use 300+ DPI scans for best OCR results
- Enable Quality Assessment - Let the tool filter low-quality extractions
- Batch Process - Process multiple documents in one workflow
- Export Metadata - Save JSON outputs for downstream processing
- GPU Acceleration - Use CUDA for 10x faster inference with AI models
Current versions:
- PyMuPDF: 1.26.4+
- Transformers: 4.55.0+
- Torch: 2.7.1+cu128
- Surya-OCR: Latest from GitHub
- Florence-2: Microsoft Research
Copyright (c) 2025 Eric Hiss. All rights reserved.
Dual-licensed:
- Non-Commercial Use: Creative Commons Attribution-NonCommercial 4.0 (CC BY-NC 4.0)
- Commercial Use: Requires separate license - contact eric@rollei.us
Important: This project uses third-party libraries with various licenses (GPL, AGPL, MIT, Apache). See CREDITS.md for complete dependency licensing.
Contributions welcome! See CONTRIBUTING.md for:
- Code style guidelines
- Testing requirements
- Pull request process
- Development setup
- Author: Eric Hiss
- GitHub: EricRollei
- Email: eric@historic.camera, eric@rollei.us
- Issues: Open an issue on GitHub for bugs or feature requests
Special thanks to:
- ComfyUI community for the amazing extensible platform
- Microsoft Research for Florence-2 vision models
- Vikp for Surya OCR
- Meta AI for SAM2 segmentation models
- Hugging Face for model hosting and transformers library
- All open-source developers whose work makes this possible
See CREDITS.md for detailed acknowledgments.
Ready to process documents! Install dependencies, restart ComfyUI, and start extracting.