Skip to content

Conversation

@lpffernando
Copy link

What this PR does

Before this PR:
The knowledge base only supported PDF document preprocessing via MinerU. Image files were not handled in the preprocessing pipeline, limiting the ability to ingest visual content directly.

After this PR:

  • Added support for image file preprocessing using MinerU by converting images to PDF format before uploading to the MinerU service.
  • Updated the knowledge queue and service logic to route image files to MinerU/Open MinerU preprocess providers when selected.
  • Enhanced the UI to allow image uploads with appropriate hints and validation.
  • Maintained backward compatibility with existing PDF processing workflows.

Fixes # (if applicable, e.g., related to image ingestion feature requests)

Why we need it and why it was done in this way

The following tradeoffs were made:

  • Chose to convert images to PDF using sharp and pdf-lib instead of direct image upload, as MinerU's API expects PDF input. This adds a small preprocessing step but ensures compatibility without modifying the external service.
  • Limited the change to MinerU/Open MinerU providers only, avoiding disruption to other preprocess providers like Doc2x or Mistral.
  • Used temporary file cleanup to manage disk space, with error handling to prevent accumulation of orphaned files.

The following alternatives were considered:

  • Direct image upload to MinerU (not supported by their API).
  • Adding a separate image-specific provider (would increase complexity and maintenance).
  • Client-side conversion (would require more dependencies and browser compatibility checks).

Links to places where the discussion took place: Internal development discussions on knowledge base enhancements.

Breaking changes

None. This PR adds new functionality without changing existing APIs or behaviors. Existing PDF processing remains unchanged.

Special notes for your reviewer

  • Tested with yarn typecheck and yarn test:main passing.
  • Image-to-PDF conversion uses sharp for format handling and pdf-lib for PDF creation.
  • The MinerU API key can be set via environment variable MAIN_VITE_MINERU_API_KEY for free tier usage.
  • Note: Runtime TLS/connection issues with MinerU may occur in restricted networks; consider adding retry logic in future iterations.

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

  • PR: The PR description is expressive enough and will help future contributors
  • Code: Code is readable and follows existing patterns (e.g., error handling, logging)
  • Refactor: Removed unused OCR-related code and cleaned up imports
  • Upgrade: No impact on upgrade flows; new feature is additive
  • Documentation: User-facing feature; consider updating knowledge base docs if merged

Release note

feat(knowledge): Add MinerU image preprocessing support

- Knowledge base now supports uploading and processing image files (JPG, PNG, etc.) via MinerU/Open MinerU providers  
- Images are automatically converted to PDF format before preprocessing  
- Enhances document ingestion capabilities for visual content  
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant