Topic: "intelligent-document-processing"
yigitkonur/llm-based-ocr
High-accuracy PDF-to-Markdown OCR API using LLMs with vision capabilities. Features parallel processing, batching, and auto-retry logic for scalable extraction.
Language: Python - Size: 73.2 KB - Last synced at: 29 days ago - Pushed at: about 1 month ago - Stars: 875 - Forks: 62
formkiq/formkiq-core
Open-source document management platform leveraging AWS managed services. RESTful API for document storage, processing, full-text search, and metadata management. Multi-tenant serverless architecture with auto-scaling... deployed entirely in your AWS account.
Language: Java - Size: 25 MB - Last synced at: 5 days ago - Pushed at: 7 days ago - Stars: 148 - Forks: 25
awslabs/rhubarb
A Python framework for multi-modal document understanding with Amazon Bedrock
Language: Python - Size: 32.4 MB - Last synced at: 15 days ago - Pushed at: 24 days ago - Stars: 98 - Forks: 14
aws-samples/sample-document-processing-with-amazon-bedrock-data-automation
This repository contains examples for customers to get started using Amazon Bedrock Data Automation. The samples focus mainly on document processing use cases
Language: Jupyter Notebook - Size: 9.46 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 12 - Forks: 10
aws-samples/sample-aws-idp-pipeline
End-to-end Intelligent Document Processing (IDP) pipeline using Amazon Bedrock, OpenSearch, Lambda, LangGraph Agents, and Step Functions. Supports multimodal document analysis for PDFs, images, videos, and audio.
Language: Python - Size: 23.1 MB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 6 - Forks: 2
Addepto/graph_builder
Open-source toolkit to extract structured knowledge graphs from documents and tables — power analytics, digital twins, and AI-driven assistants.
Language: Python - Size: 163 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 4 - Forks: 0
BABIN-JOE/NeuroDoc
NeuroDoc is a powerful AI-based offline document summarization tool that leverages OCR and NLP to intelligently analyze PDFs and generate structured summaries. Built using Flask, this tool is designed to run completely offline and supports both text-based and scanned/image-based documents.
Language: Python - Size: 13.7 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 4 - Forks: 0
GrooperGuru/GrooperCSS
Boilerplate CSS that can be used with any Grooper DataModel
Language: CSS - Size: 86.9 KB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 2 - Forks: 0
aws-samples/sample-serverless-bedrock-idp
This open-source project provides a serverless solution for automated identity document processing (IDP) using Amazon Bedrock's Claude-3 model. The solution creates an end-to-end pipeline that processes identity documents, particularly optimized for birth certificates, by automatically extracting relevant information.
Language: HCL - Size: 416 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0
paulsamuel-w-e/Multi-Modal-Government-ID-Classification
AI-powered Gov. ID classifier using OCR, BERT, ResNet, and LayoutLMv3 for Aadhar, PAN, Passport, and other scanned IDs.
Language: Python - Size: 53.4 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 1
krishachikka/Intelligent_Document_Processing Fork of anjali76Codes/Intelligent_Document_Processing
SealSure is an AI-powered tool for real-time document validation and forgery detection. Built with MERN, FastAPI, and OCR/NLP models, it helps extract, analyze, and verify data from scanned or image-based documents efficiently.
Language: JavaScript - Size: 25.2 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0
SimplePDF/pdf-ai-analyzer-with-robocorp
Leveraging the Robocorp integration to analyse customer feedback
Language: Python - Size: 7.81 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0
harshman7/insight-agent-idp
AI-powered Intelligent Document Processing (IDP) system with RAG, anomaly detection, and natural language insights. Local, zero-cost alternative to AWS Textract + Bedrock.
Language: Python - Size: 9.78 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
rahuldongre-us/idp-bedrock
An end-to-end serverless pipeline for Intelligent Document Processing (IDP) using Amazon Bedrock and Anthropic Claude 3 Sonnet. This project extracts structured data from scanned documents (e.g., PDFs, forms, invoices) using GenAI models, and stores results in a scalable cloud-native architecture.
Language: Python - Size: 916 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0
HK-Transfield/terraform-aws-gen-ai-idp
Intelligent document processing (IDP) with AWS generative AI services to automate information extraction from documents of different types and formats, without the need for machine learning skills.
Language: HCL - Size: 20.9 MB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0
digiparser/digiparser-website
DigiParser | Extract data from documents and emails
Language: TypeScript - Size: 88.1 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0