An open API service providing repository metadata for many open source software ecosystems.

Topic: "intelligent-document-processing"

yigitkonur/llm-based-ocr

High-accuracy PDF-to-Markdown OCR API using LLMs with vision capabilities. Features parallel processing, batching, and auto-retry logic for scalable extraction.

Language: Python - Size: 73.2 KB - Last synced at: 29 days ago - Pushed at: about 1 month ago - Stars: 875 - Forks: 62

formkiq/formkiq-core

Open-source document management platform leveraging AWS managed services. RESTful API for document storage, processing, full-text search, and metadata management. Multi-tenant serverless architecture with auto-scaling... deployed entirely in your AWS account.

Language: Java - Size: 25 MB - Last synced at: 5 days ago - Pushed at: 7 days ago - Stars: 148 - Forks: 25

awslabs/rhubarb

A Python framework for multi-modal document understanding with Amazon Bedrock

Language: Python - Size: 32.4 MB - Last synced at: 15 days ago - Pushed at: 24 days ago - Stars: 98 - Forks: 14

aws-samples/sample-document-processing-with-amazon-bedrock-data-automation

This repository contains examples for customers to get started using Amazon Bedrock Data Automation. The samples focus mainly on document processing use cases

Language: Jupyter Notebook - Size: 9.46 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 12 - Forks: 10

aws-samples/sample-aws-idp-pipeline

End-to-end Intelligent Document Processing (IDP) pipeline using Amazon Bedrock, OpenSearch, Lambda, LangGraph Agents, and Step Functions. Supports multimodal document analysis for PDFs, images, videos, and audio.

Language: Python - Size: 23.1 MB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 6 - Forks: 2

Addepto/graph_builder

Open-source toolkit to extract structured knowledge graphs from documents and tables — power analytics, digital twins, and AI-driven assistants.

Language: Python - Size: 163 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 4 - Forks: 0

BABIN-JOE/NeuroDoc

NeuroDoc is a powerful AI-based offline document summarization tool that leverages OCR and NLP to intelligently analyze PDFs and generate structured summaries. Built using Flask, this tool is designed to run completely offline and supports both text-based and scanned/image-based documents.

Language: Python - Size: 13.7 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 4 - Forks: 0

GrooperGuru/GrooperCSS

Boilerplate CSS that can be used with any Grooper DataModel

Language: CSS - Size: 86.9 KB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 2 - Forks: 0

aws-samples/sample-serverless-bedrock-idp

This open-source project provides a serverless solution for automated identity document processing (IDP) using Amazon Bedrock's Claude-3 model. The solution creates an end-to-end pipeline that processes identity documents, particularly optimized for birth certificates, by automatically extracting relevant information.

Language: HCL - Size: 416 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

paulsamuel-w-e/Multi-Modal-Government-ID-Classification

AI-powered Gov. ID classifier using OCR, BERT, ResNet, and LayoutLMv3 for Aadhar, PAN, Passport, and other scanned IDs.

Language: Python - Size: 53.4 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 1

krishachikka/Intelligent_Document_Processing Fork of anjali76Codes/Intelligent_Document_Processing

SealSure is an AI-powered tool for real-time document validation and forgery detection. Built with MERN, FastAPI, and OCR/NLP models, it helps extract, analyze, and verify data from scanned or image-based documents efficiently.

Language: JavaScript - Size: 25.2 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

SimplePDF/pdf-ai-analyzer-with-robocorp

Leveraging the Robocorp integration to analyse customer feedback

Language: Python - Size: 7.81 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

harshman7/insight-agent-idp

AI-powered Intelligent Document Processing (IDP) system with RAG, anomaly detection, and natural language insights. Local, zero-cost alternative to AWS Textract + Bedrock.

Language: Python - Size: 9.78 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

rahuldongre-us/idp-bedrock

An end-to-end serverless pipeline for Intelligent Document Processing (IDP) using Amazon Bedrock and Anthropic Claude 3 Sonnet. This project extracts structured data from scanned documents (e.g., PDFs, forms, invoices) using GenAI models, and stores results in a scalable cloud-native architecture.

Language: Python - Size: 916 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

HK-Transfield/terraform-aws-gen-ai-idp

Intelligent document processing (IDP) with AWS generative AI services to automate information extraction from documents of different types and formats, without the need for machine learning skills.

Language: HCL - Size: 20.9 MB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

digiparser/digiparser-website

DigiParser | Extract data from documents and emails

Language: TypeScript - Size: 88.1 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Related Topics
ocr 6 document-processing 6 ai 4 amazon-bedrock 3 python 3 fastapi 3 idp 3 generative-ai 3 bedrock 3 aws 3 streamlit 2 terraform 2 artificial-intelligence 2 intelligent-document-recognition 2 llm 2 bda 2 deep-learning 2 document-classification 2 nlp 2 automation 2 paddleocr 2 document-validation 1 windows-executable 1 flask-api 1 forgery-detection 1 invoice-processing 1 resnet 1 pytorch 1 multimodal 1 layoutlmv3 1 id-recognition 1 computer-vision 1 bert 1 pdf-editor 1 pdf-document-processor 1 css 1 text-summarization 1 document-workflow 1 accounting-automation-tool 1 serverless-framework 1 python3 1 prompt-engineering 1 macine-learning 1 cloud-architecture 1 claude-3 1 aws-lambda 1 rag 1 postgresql 1 ollama 1 faiss 1 expense-tracking 1 document-analytics 1 anomaly-detection 1 ai-agent 1 machine-learning 1 sample-code 1 sam 1 lambda 1 aws-bedrock 1 amazon 1 sealsure 1 mern 1 tesseract 1 multi-modal 1 serverless 1 optical-character-recognition 1 headless 1 document-management-system 1 document-management 1 document-layer 1 document-database 1 document-apis 1 document-api 1 dms 1 cloud-storage 1 amazon-web-services 1 vision-ocr 1 text-digitization 1 table-extraction 1 rag-pipeline 1 pymupdf 1 gpt4-vision 1 document-extraction 1 complex-layout-analysis 1 batch-ocr 1 azure-openai 1 pdf-summarization 1 ocr-tool 1 huggingface 1 flask 1 easyocr 1 bart-model 1 semantic-search 1 rag-chatbot 1 pdf-table-extraction 1 knowledge-graphs 1 knowledge-graph-construction 1 intelligent-document-processor 1 graph-visualization 1 graph-extraction 1