feat(knowledge): support MinerU image preprocessing pipeline #11083

lpffernando · 2025-10-31T10:33:32Z

What this PR does

Before this PR:
The knowledge base only supported PDF document preprocessing via MinerU. Image files were not handled in the preprocessing pipeline, limiting the ability to ingest visual content directly.

After this PR:

Added support for image file preprocessing using MinerU by converting images to PDF format before uploading to the MinerU service.
Updated the knowledge queue and service logic to route image files to MinerU/Open MinerU preprocess providers when selected.
Enhanced the UI to allow image uploads with appropriate hints and validation.
Maintained backward compatibility with existing PDF processing workflows.

Fixes # (if applicable, e.g., related to image ingestion feature requests)

Why we need it and why it was done in this way

The following tradeoffs were made:

Chose to convert images to PDF using sharp and pdf-lib instead of direct image upload, as MinerU's API expects PDF input. This adds a small preprocessing step but ensures compatibility without modifying the external service.
Limited the change to MinerU/Open MinerU providers only, avoiding disruption to other preprocess providers like Doc2x or Mistral.
Used temporary file cleanup to manage disk space, with error handling to prevent accumulation of orphaned files.

The following alternatives were considered:

Direct image upload to MinerU (not supported by their API).
Adding a separate image-specific provider (would increase complexity and maintenance).
Client-side conversion (would require more dependencies and browser compatibility checks).

Links to places where the discussion took place: Internal development discussions on knowledge base enhancements.

Breaking changes

None. This PR adds new functionality without changing existing APIs or behaviors. Existing PDF processing remains unchanged.

Special notes for your reviewer

Tested with yarn typecheck and yarn test:main passing.
Image-to-PDF conversion uses sharp for format handling and pdf-lib for PDF creation.
The MinerU API key can be set via environment variable MAIN_VITE_MINERU_API_KEY for free tier usage.
Note: Runtime TLS/connection issues with MinerU may occur in restricted networks; consider adding retry logic in future iterations.

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

PR: The PR description is expressive enough and will help future contributors
Code: Code is readable and follows existing patterns (e.g., error handling, logging)
Refactor: Removed unused OCR-related code and cleaned up imports
Upgrade: No impact on upgrade flows; new feature is additive
Documentation: User-facing feature; consider updating knowledge base docs if merged

Release note

feat(knowledge): Add MinerU image preprocessing support

- Knowledge base now supports uploading and processing image files (JPG, PNG, etc.) via MinerU/Open MinerU providers  
- Images are automatically converted to PDF format before preprocessing  
- Enhances document ingestion capabilities for visual content

feat(knowledge): support MinerU image preprocessing pipeline

1a4d0eb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(knowledge): support MinerU image preprocessing pipeline #11083

feat(knowledge): support MinerU image preprocessing pipeline #11083

Uh oh!

lpffernando commented Oct 31, 2025

Labels

1 participant

feat(knowledge): support MinerU image preprocessing pipeline #11083

Are you sure you want to change the base?

feat(knowledge): support MinerU image preprocessing pipeline #11083

Uh oh!

Conversation

lpffernando commented Oct 31, 2025

What this PR does

Why we need it and why it was done in this way

Breaking changes

Special notes for your reviewer

Checklist

Release note

Labels

1 participant