GitHub topics: document-processing
tommcd/doctk
A composable toolkit for structured document manipulation
Language: Python - Size: 221 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0
Goblanch/Expediente-Index
Pequeña aplicación de escritorio (Tkinter + ttkbootstrap) para generar un Índice de Documentos a partir de todos los PDF de una carpeta, exportando a Word (.docx) y/o PDF (.pdf).
Language: Python - Size: 43.9 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0
Poolchaos/artemis-insight
AI-powered document intelligence platform for automated PDF summarization with customizable templates
Language: Python - Size: 12.6 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0
Delshi/image_walker
Smart file organization with metadata intelligence. Automatically sort and categorize your files using advanced filtering and plugin system.
Language: Python - Size: 5.29 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0
arsalanafzal010/SmartRAG
📄 Enable smart conversations with documents, images, and audio files using this advanced Retrieval-Augmented Generation system.
Language: Python - Size: 1.39 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0
zoepranataksm/Mind_Vault_AI
📚 Automate knowledge transfer with Mind Vault, an AI-driven system that converts unstructured data into searchable insights, enhancing onboarding and team efficiency.
Language: JavaScript - Size: 1.4 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0
Valido-App/valido-app.github.io
Official website and download page for Valido - Professional PDF validation and data extraction tool for Windows
Language: HTML - Size: 367 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0
MantisAI/sieves
Plug-and-play, zero-shot document processing pipelines.
Language: Python - Size: 2.86 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 113 - Forks: 8
ucbepic/docetl
A system for agentic LLM-powered data processing and ETL
Language: Python - Size: 61.8 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 3,056 - Forks: 320
SenseiOguz/Dual-AI-Chat
🤖 Leverage dual AI models to generate precise and thoughtful responses using a flexible backend, enhancing interaction quality and reliability.
Language: TypeScript - Size: 199 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0
NoliNobdon/TriStage-RAG
🎯 Optimize retrieval with TriStage-RAG, a 3-stage pipeline that enhances document discovery while overcoming the limits of single-vector embeddings.
Language: Python - Size: 1.45 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0
anisderoual/Document_Archiver_Korean-NLP_BERTClustering
📂 Extract, embed, cluster, and securely store Korean text from documents using BERT, enhancing research efficiency and organization.
Language: Python - Size: 11.7 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0
Ai4GenXers/pdf-sentinel
Event-driven PDF to Markdown conversion for LLM workflows - 60x faster, zero idle resources
Language: Python - Size: 34.2 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0
B-A-M-N/FlockParser
Distributed document RAG system with intelligent GPU/CPU orchestration. Auto-discovers heterogeneous nodes, routes workloads adaptively, and achieves 60x+ speedups through VRAM-aware load balancing. Privacy-first architecture with 4 interfaces (CLI, API, MCP, Web UI). Real distributed systems engineering, not just an API wrapper.
Language: Python - Size: 95.3 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 4 - Forks: 2
eclaire-labs/eclaire
Local-first, open-source AI assistant for your data. Unify tasks, notes, docs, photos, and bookmarks. Private, self-hosted, and extensible via APIs.
Language: TypeScript - Size: 2.89 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 501 - Forks: 51
katrina-09/pdf-scraper
PDF scraper to extract text
Language: Python - Size: 7.81 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0
SylphxAI/pdf-reader-mcp
📄 Production-ready MCP server for PDF processing - 5-10x faster with parallel processing and 94%+ test coverage
Language: TypeScript - Size: 1.22 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 309 - Forks: 40
belumume/claude-skills
Personal collection of Claude skills - growing as I discover patterns and solve real-world problems
Language: Python - Size: 83 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0
deadhand777/doc-redaction
Document Redaction Automation Service
Language: Python - Size: 2.08 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 1 - Forks: 0
Rayyan9477/ocr-app
State-of-the-art Optical Character Recognition (OCR) with Vision Language Model (VLM) integration for enhanced accuracy and optimal document processing.
Language: TypeScript - Size: 23 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 3 - Forks: 0
jwill9999/Vector-DB-Service
A microservice that allows upload of documents from google services, and then embed them into a vector database.
Language: TypeScript - Size: 205 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0
martin-papy/qdrant-loader
Enterprise-ready vector database toolkit for building searchable knowledge bases from multiple data sources. Supports multi-project management, automatic ingestion from Confluence/JIRA/Git, intelligent file conversion (PDF/Office/images), and semantic search. Includes MCP server for seamless AI assistant integration.
Language: Python - Size: 26.8 MB - Last synced at: 6 days ago - Pushed at: 2 months ago - Stars: 15 - Forks: 9
KikuAI-Lab/DocStripper
🧹 DocStripper is a lightweight CLI utility that automatically cleans text documents
Language: Python - Size: 1.27 MB - Last synced at: 5 days ago - Pushed at: 12 days ago - Stars: 1 - Forks: 1
asbah-ramzan/HackRx-6.0-Intelligent-Query-Retrieval
🧠 Elevate document intelligence with HackRx 6.0, a powerful RAG system for extracting insights from complex files like PDFs and DOCX.
Language: Python - Size: 1.3 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 1 - Forks: 0
abhaydixit07/ayurguru-frontend
AyurGuru - Revolutionizing Wellness with Ayurveda and AI. AyurGuru is an AI-powered platform delivering Ayurvedic health solutions in real time. Users can consult a smart chatbot, upload medical reports for tailored insights, and explore comprehensive Ayurvedic blogs. Built with modern web technologies for a secure and seamless user experience.
Language: JavaScript - Size: 13.7 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 2 - Forks: 2
byerlikaya/SmartRAG
⚡ Production-ready .NET Standard 2.1 RAG library with 🤖 multi-AI provider support, 🏢 enterprise vector storage, 📄 intelligent document processing, and 🗄️ multi-database query coordination. 🌍 Cross-platform compatible.
Language: C# - Size: 11.7 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 6 - Forks: 2
watat83/document-chat-system
Open-source document chat platform with semantic search, RAG (Retrieval Augmented Generation), and multi-provider AI support (OpenRouter, OpenAI, ImageRouter).
Language: TypeScript - Size: 71.3 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 24 - Forks: 10
H0NEYP0T-466/Pen2PDF
⚡ Pen2PDF Suite – an all-in-one 🚀 productivity platform ✨ with 🤖 AI-powered text extraction (PDF/Images → Markdown 📝), 📅 smart timetable management (CSV/Excel import 📊), ✅ todo lists with subtasks📈, 🧠 AI-generated notes library 📚 and 💬 Isabella AI assistant (OpenAI/Microsoft/llama/Mistral/LongCat/Gemini models 🔄)for context-aware help 🧩.
Language: JavaScript - Size: 1.71 MB - Last synced at: 13 days ago - Pushed at: 14 days ago - Stars: 6 - Forks: 1
formkiq/formkiq-core
Open-source document management platform leveraging AWS managed services. RESTful API for document storage, processing, full-text search, and metadata management. Multi-tenant serverless architecture with auto-scaling... deployed entirely in your AWS account.
Language: Java - Size: 24.4 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 144 - Forks: 24
Keremunce/nodejs-pdf-extractor
Node.js + Express app that extracts plain text from uploaded PDFs, with a browser UI for manual tests and pdf-parse driving the extraction pipeline.
Language: HTML - Size: 13.7 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0
jmanhype/DSPy-Multi-Document-Agents
An advanced distributed knowledge fabric for intelligent document processing, featuring multi-document agents, optimized query handling, and semantic understanding.
Language: Python - Size: 143 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 49 - Forks: 5
nezchan0/SecureCompress
Privacy-first image compressor. Resize, convert & compress images offline with DPI control and cm↔px conversion. No uploads, no tracking. 🔒
Language: HTML - Size: 21.5 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0
aget-framework/template-document-processor-AGET
Production-ready template for creating document processing agents with LLM pipelines, security protocols, and multi-provider support
Language: Python - Size: 296 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0
awslabs/rhubarb
A Python framework for multi-modal document understanding with Amazon Bedrock
Language: Python - Size: 32 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 96 - Forks: 14
jvahedi/doc-sqz
Able to convert pdfs and docx to text to be used in pipeline.
Language: Python - Size: 52.7 KB - Last synced at: 15 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0
syw2014/langparse
LangParse is a universal document parsing and text chunking engine for LLM or Agent applications — Documents In, Knowledge Out.
Size: 13.7 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 1 - Forks: 0
Cerno-AI/Cerno-Insight
High-performance RAG system for intelligent document Q&A with hybrid retrieval, GPU acceleration, and citation-backed answers. Upload docs, ask questions, get precise responses.
Language: Python - Size: 32.2 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 1 - Forks: 1
OlegCheban/WaterMarkIt
A lightweight, framework-agnostic Java library for adding watermarks to various file types, including PDFs and videos
Language: Java - Size: 3.22 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 21 - Forks: 21
B-A-M-N/FlockParser-legacy
Legacy version of FlockParser PDF processing system
Language: Python - Size: 3.02 MB - Last synced at: 6 days ago - Pushed at: 25 days ago - Stars: 1 - Forks: 0
Unsiloed-AI/Unsiloed-Parser
Language: Python - Size: 114 KB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 148 - Forks: 40
RafiBG/AIChatDiscordBotWeb
Local AI chat bot for Discord with web interface for start and configuration
Language: C# - Size: 1.07 MB - Last synced at: 26 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0
xsukax/xsukax-CamScanner-PDF-Watermark-Remover
A robust, privacy-focused command-line utility that intelligently removes CamScanner watermarks from PDF documents and exports clean results to multiple formats including PDF, PNG, and multi-page TIFF.
Language: Python - Size: 26.4 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0
RanitDERIA/tata-rfp
AI-powered RFP response platform that automates proposal generation using LlamaIndex and GPT-4. Extract questions from documents and generate contextual responses 80% faster with intelligent document indexing and team collaboration.
Language: TypeScript - Size: 2.92 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0
syncfusion/document-sdk-uwp-demos
Explore the Syncfusion Universal Windows Platform demos featuring our advanced PDF, Word, Excel, and PowerPoint document processing libraries.
Language: C# - Size: 36.3 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 1 - Forks: 0
syncfusion/document-sdk-wpf-demos
this repository contains WPF demos for creating, reading, editing, and converting Excel, Word, PDF, and Presentation documents programmatically using Syncfusion .NET Document Processing libraries.
Language: C# - Size: 97.8 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 0 - Forks: 3
syncfusion/document-sdk-winforms-demos
this repository contains Windows Forms demos for creating, reading, editing, and converting Excel, Word, PDF, and Presentation documents programmatically using Syncfusion .NET Document Processing libraries.
Language: C# - Size: 55.8 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 0 - Forks: 0
syncfusion/document-sdk-asp-net-mvc-demos
Explore the Syncfusion ASP.NET MVC demos featuring our advanced PDF, Word, Excel, and PowerPoint document processing libraries
Language: C# - Size: 96.6 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 0 - Forks: 0
syncfusion/document-sdk-blazor-demos
Explore the Syncfusion Blazor demos featuring our advanced PDF, Word, Excel, and PowerPoint document processing libraries.
Language: CSS - Size: 46.3 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 0 - Forks: 0
eklem/stopword-trainer
A module for creating stopword lists for any language, based on a set of documents.
Language: JavaScript - Size: 5.01 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 15 - Forks: 0
ResetNetwork/n8n-nodes
A collection of custom n8n nodes for enhanced document processing, text splitting, and embeddings generation
Language: TypeScript - Size: 766 KB - Last synced at: 23 days ago - Pushed at: 29 days ago - Stars: 8 - Forks: 3
awslabs/project-lakechain
:zap: Cloud-native, AI-powered, document processing pipelines on AWS.
Language: TypeScript - Size: 177 MB - Last synced at: 7 days ago - Pushed at: 8 months ago - Stars: 185 - Forks: 26
hammad-haque/felice-legal-ai
Felice Legal AI - AI-powered personal injury document processing platform (eve.legal replacement)
Size: 0 Bytes - Last synced at: 11 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
ma3u/neo4j-agentframework
🚀 Hybrid RAG: Local Neo4j + BitNet.cpp RAG System and Azure SaaS deployment. Fast vector search, instant Docker deployment via GitHub Container Registry. Complete RAG pipeline with ultra-efficient LLMs for enterprise knowledge management.
Language: Python - Size: 35.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0
Asamaurdhava/Claria
Claria instantly transforms any complex document—legal contracts, medical reports, technical specs—into crystal-clear language anyone can understand, powered by Chrome's revolutionary built-in AI that runs entirely on-device for complete privacy.
Language: JavaScript - Size: 266 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
Alijanloo/Pdf2Table
A Python library for extracting tables from PDF documents using computer vision and image processing techniques. It converts PDF pages to images, detects tables, recognizes their structure, and outputs clean data in JSON format.
Language: Python - Size: 2.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
mehmetaltinbas/ExtralyzUI
An AI-powered study platform where users upload documents (PDF, DOCX, TXT, ...) to get more understandable abstractive summaries of chosen length and auto-generated practice exercises (open-ended, multiple-choice, true-false, ...).
Language: TypeScript - Size: 637 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
manasbhansali27/chat-with-files
A lightweight local AI assistant that lets you chat with your files — PDFs, documents, images, videos, and code — using semantic search, embeddings, OCR, and multimodal LLMs. Optimized to run on modest GPUs (e.g., RTX 3050 4GB) without requiring heavy VRAM like ChatRTX.
Language: Python - Size: 31.3 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
parsee-ai/parsee-core
Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular data extraction and multimodal queries.
Language: Python - Size: 1.23 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 75 - Forks: 1
indu-explores-data/Automated-Resume-Data-Extraction
Automated resume information extraction using NLP. The project extracts Name, Email, and Phone from TXT, DOCX, and PDF files using spaCy and regex. It converts unstructured data into structured formats, improving recruitment efficiency and enabling scalable candidate profiling.
Language: Jupyter Notebook - Size: 71.3 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
MyGovHub-Goodbye-World/backend-agent-mcp
AI-Powered Government Services Assistant - Serverless AWS Lambda function built for MyGovHub that intelligently handles Malaysian driving license renewals and TNB electricity bill payments through document OCR, AI chat responses, and secure payment processing.
Language: Python - Size: 484 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
xsukax/xsukax-Word-Document-Comparison-Tool
A powerful, privacy-focused web application for side-by-side comparison of Word documents with intelligent diff highlighting, comprehensive analytics, and multilingual support including Arabic and RTL languages.
Language: Python - Size: 27.3 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
KaramelBytes/docloom-cli
AI‑augmented document analysis and lightweight retrieval (Go) with OpenRouter and Ollama. Cross‑platform binaries, cost guardrails, and streaming.
Language: Go - Size: 128 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
akoutsop1909/pdf-to-txt-converter
A simple Java CLI tool for batch-converting PDF files to TXT format. Supports file filtering by filename wildcards and last modified date.
Language: Java - Size: 12.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
airiseworks/doc2md-api
📄 Convert DOCX, PDF, PPTX, and images to Markdown effortlessly with this secure API built in Python, featuring API key protection and Docker support.
Language: Python - Size: 1.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
nvisycom/run
Multimodal extraction runtime for the platform. Processes images, PDFs, and scanned documents to enable automated detection and removal of sensitive information.
Size: 35.2 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0
ucbepic/TWIX
TWIX is an open-source data extraction tool that reconstructs structured data from documents at scale, accurately and at low cost, by inferring the shared underlying visual template across documents
Language: Python - Size: 177 MB - Last synced at: 13 days ago - Pushed at: 6 months ago - Stars: 207 - Forks: 16
PSPDFKit/nutrient-document-engine-mcp-server
A Model Context Protocol (MCP) server implementation exposes document processing capabilities through natural language, supporting both direct human interaction and AI agent tool calling.
Language: TypeScript - Size: 25 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 56 - Forks: 1
jayll1303/table2html
A Python package that converts table images into HTML format using Object Detection model and OCR.
Language: Python - Size: 381 MB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 7 - Forks: 0
felixdittrich92/docling-OCR-OnnxTR
OnnxTR OCR plugin for Docling
Language: Python - Size: 1.5 MB - Last synced at: 25 days ago - Pushed at: 2 months ago - Stars: 12 - Forks: 0
MyGovHub-Goodbye-World/document-ingestion-and-text-extraction
AI-powered document analysis service combining AWS Textract, Bedrock, and intelligent blur detection. Supports CLI and serverless Lambda API for Malaysian documents (licenses, receipts, ID cards, utility bills).
Language: Python - Size: 5.35 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
MrSpecks/365-QnA-Chatbot
General Question and Answer Chatbot using langChain
Language: Python - Size: 9.77 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
danielsobrado/OxideRAG
A Rust-first RAG toolkit that blends page indexing, mindmaps, and knowledge graphs to retrieve and reason over structured data, chats, emails, and PDFs.
Language: Rust - Size: 20.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0
pandaxbacon/AutoChunker
🪓 Lumberjack - AI-powered document parser with interactive tree editor. Transform PDFs, DOCX, PPTX into perfectly structured chunks for vector databases. 5 parsers, Firebase integration, live demo available.
Language: TypeScript - Size: 8.71 MB - Last synced at: 20 days ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 0
easytocloud/Mac-letterhead
A macOS utility for merging letterhead templates with PDF and Markdown documents using a drag-and-drop interface
Language: Python - Size: 4.83 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
Qleric-labs/Contract-extraction-assistant
Turn contract PDFs into structured data in seconds. Local-first extraction
Language: TypeScript - Size: 2.41 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
Magnet-AI/Quanta
Advanced PDF layout analysis engine for extracting figures, tables, and structured content from complex engineering documents using computer vision and machine learning.
Language: Python - Size: 85.9 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0
iamarunbrahma/pdf-to-markdown
Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced information retrieval and processing.
Language: Python - Size: 69.3 KB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 95 - Forks: 8
kili-technology/awesome-datasets
A comprehensive list of annotated training datasets classified by use case.
Size: 24.9 MB - Last synced at: about 10 hours ago - Pushed at: over 3 years ago - Stars: 35 - Forks: 6
MBAigner/PDFSegmenter
This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified and returned. Tables are retrieved formatted as a CSV.
Language: Python - Size: 399 KB - Last synced at: 23 days ago - Pushed at: about 5 years ago - Stars: 22 - Forks: 3
jmragsdale/azure-blob-ai-doc-summarizer
Serverless AI document summarization using Azure Functions, Blob Storage, and Azure OpenAI. Automatically extract and summarize PDFs, DOCX, TXT, and Markdown files.
Language: Python - Size: 23.4 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0
abdullahshafiq-20/ResumeTex
ResumeTex is an AI-powered tool that converts standard PDF resumes into professionally formatted LaTeX documents. This service helps you create elegant, structured resumes without needing to learn LaTeX syntax.
Language: JavaScript - Size: 163 KB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 37 - Forks: 5
aws-solutions/enhanced-document-understanding-on-aws
Enhanced Document Understanding on AWS delivers an easy-to-use web application that ingests and analyzes documents, extracts content, identifies and redacts sensitive customer information, and creates search indexes from the analyzed data.
Language: JavaScript - Size: 62.9 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 40 - Forks: 19
smart-models/Sentences-Chunker
Cutting-edge tool designed to intelligently segment text documents into optimally-sized chunks
Language: Python - Size: 1.98 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 6 - Forks: 0
saksham-1304/AskMyPDF
🤖 AI-Powered PDF Chat App | Dual AI Engine (Alchemyst + Gemini) | RAG Pipeline | Vector Search | MERN + TypeScript
Language: TypeScript - Size: 615 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 7 - Forks: 0
jasoncobra3/Floorplan-Dimractor
A sophisticated Python pipeline for automatically extracting dimensions and cabinet codes from architectural floorplan PDFs. This tool converts various dimension formats into standardized measurements and provides structured output with visualization capabilities.
Language: Python - Size: 2.16 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0
delontejones-cpu/InTakeOff
HIPAA-compliant ABA therapy intake management platform streamlining patient onboarding, insurance verification, and document processing for autism therapy providers.
Language: TypeScript - Size: 387 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0
theogyeezy/rag-multi-agent-template
RAG enabled multi agent template using CrewAI and WatsonxAI. Supports ChromaDB, FAISS, Pinecone with document processing for PDF/DOCX/TXT. Includes legal, technical, and customer support examples.
Language: Python - Size: 67.4 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0
quarkiverse/quarkus-docling
Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem
Language: Java - Size: 148 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 8 - Forks: 4
autollama/autollama
Anthropic's Contextual Retrieval implementation with visual chunk comparison. Preview context enrichment before/after embedding.
Language: HTML - Size: 21 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 24 - Forks: 0
mehmetaltinbas/ExtralyzAPI
An AI-powered study platform where users upload documents (PDF, DOCX, TXT, ...) to get more understandable abstractive summaries of chosen length and auto-generated practice exercises (open-ended, multiple-choice, true-false, ...).
Language: TypeScript - Size: 1.29 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0
trsdn/MistralDocAI-mcp
MCP (Model Context Protocol) server for document-to-Markdown conversion using Mistral AI OCR. Compatible with Claude Desktop and other MCP clients.
Language: Python - Size: 78.1 KB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0
ucbepic/BARGAIN
Low-Cost LLM-Powered Data Processing with Theoretical Guarantees
Language: Python - Size: 18.9 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 28 - Forks: 3
mattweg/pdf-form-filler
Automated PDF form filler for insurance claims and other documents
Language: Python - Size: 62.5 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0
Tele-AI/doc-ops-mcp
MCP server for seamless document format conversion and processing
Language: TypeScript - Size: 631 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 129 - Forks: 2
SuperDappAI/ai-agent-backend
AI agent backend with long-term memory, document processing, semantic search, and dynamic function orchestration.
Language: Python - Size: 662 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0
gabriel-alves051294/python-document-merger
Ferramenta Python para unificar e converter arquivos Word (.doc, .docx). Ideal para automação, limpeza de dados e preparação para IA. Python script to merge and convert .doc/.docx files.
Language: Python - Size: 58.6 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0
smart-models/Normalized-Semantic-Chunker
Cutting-edge tool that unlocks the full potential of semantic chunking
Language: Python - Size: 3.82 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 17 - Forks: 4
dvy9/tldrify
AI-powered summarization tool that converts text, URLs, and documents into concise recaps with customizable tone and length.
Language: TypeScript - Size: 163 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0
NamanSingh24/NeuCom
🧠 AI-powered Standard Operating Procedure (SOP) interpreter with voice-first interaction, Knowledge Graph intelligence, and RAG technology. Transform complex procedures into conversational AI assistance using React, FastAPI, Neo4j, and Groq LLM.
Language: JavaScript - Size: 408 KB - Last synced at: 23 days ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0
THJLI/doc2md-api
API built with Python (FastAPI + Microsoft MarkItDown) to convert common document formats (DOCX, PDF, PPTX, images, etc.) into Markdown. Secured with API Key (header X-API-Key) and packaged for Docker/Coolify deployments, including a healthcheck endpoint.
Language: Python - Size: 9.77 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0