High-accuracy PDF-to-Markdown OCR API using LLMs with vision capabilities. Features parallel processing, batching, and auto-retry logic for scalable extraction.
-
Updated
Nov 29, 2025 - Python
High-accuracy PDF-to-Markdown OCR API using LLMs with vision capabilities. Features parallel processing, batching, and auto-retry logic for scalable extraction.
Open-source document management platform leveraging AWS managed services. RESTful API for document storage, processing, full-text search, and metadata management. Multi-tenant serverless architecture with auto-scaling... deployed entirely in your AWS account.
A Python framework for multi-modal document understanding with Amazon Bedrock
Open-source toolkit to extract structured knowledge graphs from documents and tables — power analytics, digital twins, and AI-driven assistants.
This repository contains examples for customers to get started using Amazon Bedrock Data Automation. The samples focus mainly on document processing use cases
NeuroDoc is a powerful AI-based offline document summarization tool that leverages OCR and NLP to intelligently analyze PDFs and generate structured summaries. Built using Flask, this tool is designed to run completely offline and supports both text-based and scanned/image-based documents.
This open-source project provides a serverless solution for automated identity document processing (IDP) using Amazon Bedrock's Claude-3 model. The solution creates an end-to-end pipeline that processes identity documents, particularly optimized for birth certificates, by automatically extracting relevant information.
Boilerplate CSS that can be used with any Grooper DataModel
AI-powered Gov. ID classifier using OCR, BERT, ResNet, and LayoutLMv3 for Aadhar, PAN, Passport, and other scanned IDs.
An end-to-end serverless pipeline for Intelligent Document Processing (IDP) using Amazon Bedrock and Anthropic Claude 3 Sonnet. This project extracts structured data from scanned documents (e.g., PDFs, forms, invoices) using GenAI models, and stores results in a scalable cloud-native architecture.
Leveraging the Robocorp integration to analyse customer feedback
Project Repository for Problem 1(b) of Adobe Hackathon 2025
SealSure is an AI-powered tool for real-time document validation and forgery detection. Built with MERN, FastAPI, and OCR/NLP models, it helps extract, analyze, and verify data from scanned or image-based documents efficiently.
End-to-end Intelligent Document Processing (IDP) pipeline using Amazon Bedrock, OpenSearch, Lambda, LangGraph Agents, and Step Functions. Supports multimodal document analysis for PDFs, images, videos, and audio.
Cognivia AI is a powerful AI-powered PDF search and question-answering system built with LangChain, Pinecone Vector Store, OpenAI, and Supabase. Upload PDFs, ask questions, and get intelligent answers with persistent conversation memory.
Intelligent document processing (IDP) with AWS generative AI services to automate information extraction from documents of different types and formats, without the need for machine learning skills.
Amazon Bedrock Data Automation usecases
AI-powered Intelligent Document Processing (IDP) system with RAG, anomaly detection, and natural language insights. Local, zero-cost alternative to AWS Textract + Bedrock.
Add a description, image, and links to the intelligent-document-processing topic page so that developers can more easily learn about it.
To associate your repository with the intelligent-document-processing topic, visit your repo's landing page and select "manage topics."