feat(knowledge): support MinerU image preprocessing pipeline #11083
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does
Before this PR:
The knowledge base only supported PDF document preprocessing via MinerU. Image files were not handled in the preprocessing pipeline, limiting the ability to ingest visual content directly.
After this PR:
Fixes # (if applicable, e.g., related to image ingestion feature requests)
Why we need it and why it was done in this way
The following tradeoffs were made:
sharpandpdf-libinstead of direct image upload, as MinerU's API expects PDF input. This adds a small preprocessing step but ensures compatibility without modifying the external service.The following alternatives were considered:
Links to places where the discussion took place: Internal development discussions on knowledge base enhancements.
Breaking changes
None. This PR adds new functionality without changing existing APIs or behaviors. Existing PDF processing remains unchanged.
Special notes for your reviewer
yarn typecheckandyarn test:mainpassing.sharpfor format handling andpdf-libfor PDF creation.MAIN_VITE_MINERU_API_KEYfor free tier usage.Checklist
This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.
Release note