Skip to content

Feat: Add image (JPG/PNG) OCR support to document parsing and extraction #1110

Description

@RohitR311

The problem

Maxun has a feature where users can upload a document and get back its content as text, HTML, or a list of links (soon also a summary and AI-based structured extraction). Right now this only works for PDFs - if you try to upload an image, like a scanned receipt as a JPG or PNG, it gets rejected outright since the system only lets PDFs through.

A lot of the documents people actually want text from are just photos or scans, not PDFs.

What needs to change

Add JPG and PNG as supported upload types and run OCR on them to pull out their text, the same way scanned PDFs are already handled. Maxun already has OCR built in for scanned PDFs (tesseract.js and a PaddleOCR-based tool), so this is mostly about reusing that pipeline for images directly.

What "done" looks like

A user uploads a .jpg or .png file instead of a PDF.

The system runs OCR on it and returns the extracted text in the same output formats already supported for PDFs (markdown/html/links, plus extraction).

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions