An open API service providing repository metadata for many open source software ecosystems.

Topic: "document-processing"

ucbepic/docetl

A system for agentic LLM-powered data processing and ETL

Language: Python - Size: 54.6 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 2,366 - Forks: 237

enoch3712/ExtractThinker

ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.

Language: Python - Size: 20.4 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1,274 - Forks: 129

dhlab-epfl/dhSegment

Generic framework for historical document processing

Language: Python - Size: 5.89 MB - Last synced at: 24 days ago - Pushed at: about 4 years ago - Stars: 378 - Forks: 115

ucbepic/TWIX

TWIX is an open-source data extraction tool that reconstructs structured data from documents at scale, accurately and at low cost, by inferring the shared underlying visual template across documents

Language: Python - Size: 177 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 185 - Forks: 8

awslabs/project-lakechain

:zap: Cloud-native, AI-powered, document processing pipelines on AWS.

Language: TypeScript - Size: 177 MB - Last synced at: 24 days ago - Pushed at: 4 months ago - Stars: 180 - Forks: 26

formkiq/formkiq-core

A full-featured Document Management Platform / Document Layer for your application, providing storage, discovery, processing, and retrieval. Deploys directly into your Amazon Web Services Cloud. Please ๐ŸŒŸ star to support our work!

Language: Java - Size: 20.7 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 131 - Forks: 20

awslabs/rhubarb

A Python framework for multi-modal document understanding with Amazon Bedrock

Language: Python - Size: 32 MB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 93 - Forks: 11

iamarunbrahma/pdf-to-markdown

Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced information retrieval and processing.

Language: Python - Size: 69.3 KB - Last synced at: 8 days ago - Pushed at: 8 months ago - Stars: 86 - Forks: 8

parsee-ai/parsee-core

Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular data extraction and multimodal queries.

Language: Python - Size: 1.24 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 71 - Forks: 1

steindani/pandoc-include

An include filter for Pandoc

Language: Haskell - Size: 9.77 KB - Last synced at: 19 days ago - Pushed at: over 4 years ago - Stars: 62 - Forks: 20

PSPDFKit/nutrient-document-engine-mcp-server

A Model Context Protocol (MCP) server implementation exposes document processing capabilities through natural language, supporting both direct human interaction and AI agent tool calling.

Language: TypeScript - Size: 25 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 54 - Forks: 0

aws-solutions/enhanced-document-understanding-on-aws

Enhanced Document Understanding on AWS delivers an easy-to-use web application that ingests and analyzes documents, extracts content, identifies and redacts sensitive customer information, and creates search indexes from the analyzed data.

Language: JavaScript - Size: 61.3 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 38 - Forks: 16

abdullahshafiq-20/ResumeTex

ResumeTex is an AI-powered tool that converts standard PDF resumes into professionally formatted LaTeX documents. This service helps you create elegant, structured resumes without needing to learn LaTeX syntax.

Language: JavaScript - Size: 689 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 37 - Forks: 4

cburschka/lyx

Unofficial mirror of git://git.lyx.org/lyx.git (updates daily. not affiliated with lyx.org.)

Language: C++ - Size: 616 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 36 - Forks: 7

kili-technology/awesome-datasets

A comprehensive list of annotated training datasets classified by use case.

Size: 24.9 MB - Last synced at: about 7 hours ago - Pushed at: about 3 years ago - Stars: 35 - Forks: 6

jmanhype/DSPy-Multi-Document-Agents

An advanced distributed knowledge fabric for intelligent document processing, featuring multi-document agents, optimized query handling, and semantic understanding.

Language: Python - Size: 135 KB - Last synced at: about 8 hours ago - Pushed at: 11 months ago - Stars: 33 - Forks: 3

afrozas/proceedings

Semantic extraction from conference proceedings.

Language: Python - Size: 1.06 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 31 - Forks: 1

MBAigner/PDFSegmenter

This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified and returned. Tables are retrieved formatted as a CSV.

Language: Python - Size: 399 KB - Last synced at: 29 days ago - Pushed at: almost 5 years ago - Stars: 23 - Forks: 3

greed2411/tokyo

tokyo, a REST API, when given any type of document ๐Ÿ“„, Identifies mime-type ๐Ÿง. Suggests extension ๐Ÿฆ”. Alas Extracts text ๐Ÿ’ช.

Language: Clojure - Size: 19.5 KB - Last synced at: 3 months ago - Pushed at: about 5 years ago - Stars: 18 - Forks: 0

eklem/stopword-trainer

A module for creating stopword lists for any language, based on a set of documents.

Language: JavaScript - Size: 6.16 MB - Last synced at: 3 days ago - Pushed at: 10 months ago - Stars: 15 - Forks: 0

smart-models/Normalized-Semantic-Chunker

Cutting-edge tool that unlocks the full potential of semantic chunking

Language: Python - Size: 3.76 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 13 - Forks: 4

aws-samples/sample-document-processing-with-amazon-bedrock-data-automation

This repository contains examples for customers to get started using Amazon Bedrock Data Automation. The samples focus mainly on document processing use cases

Language: Jupyter Notebook - Size: 9.46 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 12 - Forks: 10

vakharwalad23/mark-minion

The Ultimate Web Content Extraction & Conversion Tool for AI/LLM Applications. Convert almost any web content into clean Markdown with intelligent AI processing.

Language: TypeScript - Size: 868 KB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 9 - Forks: 1

aws-samples/sample-for-multi-modal-document-to-json-with-sagemaker-ai

This open-source project delivers a complete pipeline for converting multi-page documents (PDFs/images) into structured JSON using Vision LLMs on Amazon SageMaker. The solution leverages the SWIFT Framework to fine-tune models specifically for document understanding tasks.

Language: Jupyter Notebook - Size: 3.22 MB - Last synced at: 10 days ago - Pushed at: 11 days ago - Stars: 7 - Forks: 1

felixdittrich92/docling-OCR-OnnxTR

OnnxTR OCR plugin for Docling

Language: Python - Size: 1.49 MB - Last synced at: 8 days ago - Pushed at: 27 days ago - Stars: 7 - Forks: 0

aws-samples/idp-invoice-automation-using-bedrock-data-automation-cdk

Serverless Intelligent Document Processing (IDP) solution for invoice automation using Amazon Bedrock Data Automation. Features automated data extraction, annotation, and processing pipeline built with AWS services and CDK.

Language: Python - Size: 204 KB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 7 - Forks: 1

drgsn/filefusion

FileFusion is a powerful file concatenation tool designed specifically for Large Language Model (LLM)

Language: Go - Size: 173 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 6 - Forks: 0

abdur75648/urdu-text-detection

Text line detection for Urdu OCR (UTRNet)

Language: Python - Size: 48.5 MB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 6 - Forks: 1

jeanbaptisteb/doccleaner

A Python command-line utility intended for automating some copyediting tasks in documents. It allows editing zipped, XML-based files (e.g. docx, odt, or epub), through XSLT stylesheets. Can be rather easily extended with your own custom xsl stylesheets.

Language: XSLT - Size: 81.1 KB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 6 - Forks: 2

martin-papy/qdrant-loader

Enterprise-ready vector database toolkit for building searchable knowledge bases from multiple data sources. Supports multi-project management, automatic ingestion from Confluence/JIRA/Git, intelligent file conversion (PDF/Office/images), and semantic search. Includes MCP server for seamless AI assistant integration in development environments.

Language: Python - Size: 4.94 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 5 - Forks: 1

quarkiverse/quarkus-docling

Docling simplifies document processing, parsing diverse formats โ€” including advanced PDF understanding โ€” and providing seamless integrations with the gen AI ecosystem

Language: Java - Size: 123 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 5 - Forks: 3

m4nd0mb3/document-templater

Document Templater is a powerful tool for automated document generation. Streamline the process of creating standard documents, such as contracts, reports, and forms, using predefined templates. This repository contains the source code for Document Templater, allowing you to easily integrate this functionality into your projects and automate docs.

Language: JavaScript - Size: 579 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 5 - Forks: 0

CentralFloridaAttorney/zmongo_retriever

Use data from MongoDB in LangChain, Llama and OpenAI

Language: Python - Size: 27.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 4 - Forks: 1

Swiftgum/swiftgum

The user data connection layer for AI applications. Transform any source into LLM-ready markdown. Focus on your AI, not integrations.

Language: TypeScript - Size: 3.05 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 4 - Forks: 0

baughmann/tikara

The metadata and text content extractor for almost every file type.

Language: Python - Size: 161 MB - Last synced at: 2 days ago - Pushed at: 6 months ago - Stars: 4 - Forks: 0

AmadeusITGroup/docs2vecs

CLI that helps with docs splitting, embedding and exposing them in a seamless manner

Language: Python - Size: 1.51 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 3 - Forks: 5

johnsirmon/clearcouncil

ClearCouncil: Automated tools for collecting, organizing, and embedding publicly available local state county council documents (minutes, agendas) into LLMs. Python, JS, and wget scripts included for easy data retrieval and integration.

Language: Python - Size: 134 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 2 - Forks: 2

Ftjjgfgh/scientific-pdf-translator

# Scientific PDF TranslatorThis project offers a high-quality translation system for scanned scientific PDFs, converting documents from English to French while preserving formatting and mathematical expressions. With features like OCR integration and LaTeX output, it ensures accurate and professional results for academic use. ๐Ÿ› ๏ธ๐Ÿ“„

Language: Shell - Size: 30.3 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 2 - Forks: 0

ResetNetwork/n8n-nodes

A collection of custom n8n nodes for enhanced document processing, text splitting, and embeddings generation

Language: TypeScript - Size: 1.23 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 2 - Forks: 1

Jayanth-MKV/advanced-rag-cookbooks

Advanced RAG Techniques and Projects

Language: Jupyter Notebook - Size: 4.25 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

mancrurod/Resume-Optimization

Personal project that automates resume adaptation using LLMs. Converts .docx resumes to Markdown, tailors them to job descriptions with GPT-4o-mini or Gemini, and exports clean HTML and PDF resumes โ€” with built-in editing and logging features.

Language: Python - Size: 71.3 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

Danitilahun/Document-processing-Pdf-Structured-Data-Extractor

This project demonstrates how to extract structured information from PDF documents using a combination of Langchain, OpenAI models, and the DocLing library. It provides a framework for parsing PDFs and leveraging LLMs to identify and format key data points.

Language: Jupyter Notebook - Size: 64.5 KB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 1

jayllfpt/table2html

A Python package that converts table images into HTML format using Object Detection model and OCR.

Language: Python - Size: 365 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

caltechlibrary/popstar

Phone-Oriented Processing SofTware for ARchives

Language: Makefile - Size: 49.2 MB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

Oneirocom/generative-intent-detection

Generative intent detection with Magick

Language: TypeScript - Size: 42 KB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

RPetitpierre/Generic_Semantic_Segmentation_of_Historical_Maps

Language: Jupyter Notebook - Size: 94.4 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0

AhmedZeyadTareq/Smart-markdown-Extractor

AI-powered document processing tool with smart extraction, OCR, and intelligent content analysis

Language: Python - Size: 152 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

Rayyan9477/ocr-app

State-of-the-art Optical Character Recognition (OCR) with Vision Language Model (VLM) integration for enhanced accuracy and optimal document processing.

Language: TypeScript - Size: 22.9 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 1 - Forks: 0

samay-jain/Voice_Assistant_RAG_System_using_LangChain_and_Streamlit

Voice Assistant RAG System using LangChain, Whisper, and Streamlit - A voice-enabled assistant that lets you ask questions by speaking, processes your custom documents, and responds with natural speech. Built with LangChain, Ollama, Whisper, ElevenLabs, and Streamlit.

Language: Python - Size: 367 KB - Last synced at: 4 days ago - Pushed at: 21 days ago - Stars: 1 - Forks: 0

JDM-Github/debahra-efficio

DEHBARA (Efficio) is a React and Express-based web application designed to streamline service requests for DTI, SSS, and other document processing needs. It simplifies the process of requesting official papers and services, integrating cloud storage for efficient data management.

Language: TypeScript - Size: 14.7 MB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 1 - Forks: 0

acsenrafilho/cucaracha

A bureaucratic cockroach (cucaracha) assistent to help in document processing and analysis

Language: Python - Size: 5.93 MB - Last synced at: 8 days ago - Pushed at: 24 days ago - Stars: 1 - Forks: 1

adamrangwala/DirCity_Directory_Crop-out-with-Key-Lines

Turn Old City Directory scans into searchable data. Automated pipeline handles column detection, OCR processing, and accuracy evaluation for historical document digitization.

Language: Python - Size: 40.4 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 1 - Forks: 0

Daniel-codi/Concept_Curve_Embeddings_Indexation

Code to make any AI have unlimited context persistent memory. In the example, a software for any AI to read the Uniform Commercial Code of Michigan. A document of 220,000 tokens

Language: JavaScript - Size: 22.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

natgluons/AI-docs-analyzer-API

Automate invoice analysis and identity verification, built with an open-source multimodal LLM and OCR (DocTR/TrOCR), using FastAPI, Supabase, PgVector, and Neo4j.

Language: Python - Size: 8.79 KB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

reinelt88/rag-chatbot-documents

This project implements a RAG (Retrieval-Augmented Generation) based chatbot that allows you to upload PDF documents, index them with embeddings, and ask questions about their content. It supports both OpenAI and Hugging Face models via the Inference API.

Language: Python - Size: 215 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

gs-ai/PDFProfessor

PDF Professor 2.0 extracts and processes PDF text, analyzed by Ollama for summarization, data extraction, and insights. More coming soon!

Language: Python - Size: 1.95 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

jromero132/pdf-splitter

PDF Splitter is a Python tool that takes a multi-page PDF file and splits it into individual PDF files, one for each page of the original document.

Language: Python - Size: 2.93 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

jromero132/pdf-merger

A Python utility for merging multiple PDFs and images into a single PDF file. This tool maintains aspect ratios, centers content on custom-sized pages (default A4), and supports recursive directory processing. Perfect for organizing documents and creating cohesive PDF compilations.

Language: Python - Size: 2.93 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

FayazK/Document-Metadata-Extractor

A Python tool that uses Google's Gemini AI to automatically extract structured metadata from PDF and DOCX documents, saving results to Excel for easy analysis and organizing raw responses as JSON files.

Language: Python - Size: 11.7 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

souvik03-136/TenderBot

Task

Language: Python - Size: 127 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

jcaperella29/Document_cleaning_CLI

A deep learning-based pipeline for cleaning scanned document images. Automatically removes noise, enhances text clarity, and optimizes images for OCR. ๐Ÿš€

Language: MATLAB - Size: 94.5 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

Md-Emon-Hasan/LangChain

Powerful framework for building applications with Large Language Models (LLMs), enabling seamless integration with memory, agents, and external data sources.

Language: Jupyter Notebook - Size: 737 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

kallebysantos/ocrlot

A distributed ocr engine ๐Ÿ†

Language: Elixir - Size: 291 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

Huang-lab/figure-extractor

Flask-based service using PDFFigures 2.0 to extract figures and tables from scholarly PDFs. Features REST API, CLI, Docker support, and JSON metadata output (~1.5s/page processing). Designed for document processing and RAG pipelines.

Language: Python - Size: 16.8 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

BjornMelin/pdfusion

A lightweight Python utility for effortlessly merging multiple PDF files into a single document.

Language: Python - Size: 40 KB - Last synced at: 4 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

Shahrom-S/BarsAI

AI assistant

Language: Python - Size: 11.2 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

dayang4321/MSc-Team-Project-CMPU9010-2023-24-Group-3

TU Dublin Computer Science MSc. Final Project Group 3 - Accessibilator

Language: Jupyter Notebook - Size: 100 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

x1ao4/doc-merger

้€š่ฟ‡ python ่„šๆœฌๅฐ†ไธคไธช็›ธๅฏนไธๅฎŒๆ•ด็š„ๆ–‡ๆกฃๅˆๅนถไธบไธ€ไธชๅฎŒๆ•ด็š„ๆ–‡ๆกฃ / merge two relatively incomplete documents into one complete document via python script

Language: Python - Size: 22.5 KB - Last synced at: 29 days ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

joseferrerh/invoices-leanautomation

This set of robots provides support for automatically obtaining information from invoices using docDigitizer API and keep track of the processed invoices on an Airtable repository

Language: RobotFramework - Size: 403 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

thoth2357/Watermark-removal

Program Helps remove watermark from a pdf document

Language: Python - Size: 3.91 KB - Last synced at: 26 days ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

anne27/Information-Retrieval

An implementation of basic IR techniques from scratch.

Language: Python - Size: 27.8 MB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 0

Jackojc/old-wotpp ๐Ÿ“ฆ

A document preprocessor that works in conjunction with tools like groff/troff & refer.

Language: C++ - Size: 60.5 KB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

GuglielmoCerri/test-assets

A version-controlled collection of stable assets (documents, images, etc..) for integration testing

Size: 139 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

Horneychan/rag-contract-analyzer

Analyze rental and purchase contracts with RAG technology. Identify risks, ensure compliance, and extract key insights effortlessly. ๐Ÿ› ๏ธ๐Ÿ“„

Language: Python - Size: 47.9 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

easytocloud/Mac-letterhead

A macOS utility for merging letterhead templates with PDF and Markdown documents using a drag-and-drop interface

Language: Python - Size: 3.36 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

deeksha006/PII-DETECTION

A Streamlit web application for detecting and redacting Personally Identifiable Information (PII) in documents including PDFs, images, and text files. Supports Aadhaar, PAN, Driving License, and Voter ID detection with automated redaction capabilities.

Language: Python - Size: 5.86 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

DaandinhoPy94/rag-contract-analyzer

๐Ÿค– RAG-powered contract analyzer using Gemini API, LangChain & ChromaDB for intelligent legal document analysis

Language: Python - Size: 31.3 KB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

divyasree-dolly/Smart-Resume-Cover-Letter-Generator

๐Ÿš€ AI-powered web app that generates tailored cover letters and enhances resume bullet points using OpenAI GPT. Built with Streamlit for easy job application optimization.

Language: Python - Size: 22.5 KB - Last synced at: 8 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

klapom/generic-kg-pipeline

A flexible, plugin-based pipeline system for extracting knowledge graphs from documents

Language: Python - Size: 6.21 MB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

hasnaintypes/lawbotics-v2

LawBotics v2 is an AI-powered legal contract analysis platform that combines machine learning with modern web technologies to automate legal document review and clause extraction.

Language: TypeScript - Size: 83.6 MB - Last synced at: 10 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

kevv1m/tikara

The metadata and text content extractor for almost every file type.

Size: 1000 Bytes - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

lifenture/flash-mail-merge

Streamline Document Automation with Serverless Mail Merge and DOCX Processing.

Language: Go - Size: 15.2 MB - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

akshitharsola/overleaf-automation

๐Ÿš€ Intelligent document analysis and LaTeX conversion automation tool. Converts Word documents (.docx) to LaTeX with automatic table detection, equation recognition, and multi-format support (ACM, IEEE, Springer). Built with React & TypeScript.

Language: TypeScript - Size: 692 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

ibarani/boxsavant

AI-powered document organization system for Box.com with full account reorganization capabilities

Language: Python - Size: 6.58 MB - Last synced at: 14 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

evaibhav/Ai-File-Analyzer

๐Ÿ” AI-powered document analysis webapp - Upload files (PDF, DOCX, TXT, CSV, XLSX) and get intelligent analysis using local Ollama AI. Built with Flask and Python. Privacy-first with local processing.

Language: Python - Size: 219 KB - Last synced at: 17 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

TimInTech/pdf-text-duplicate-checker

PDF Duplicate Detector & Mover (Text + Image Hashing)

Language: Python - Size: 98.4 MB - Last synced at: 6 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

Hassan-Memon/ai-book-translator

An AI-powered Urdu to Arabic book translator that intelligently processes documents (PDF, Word, Excel, or images), chunks content based on structure, and uses multi-stage LLM agents to ensure accurate, context-aware, and faithful translation without omissions or additions.

Language: Python - Size: 5.86 KB - Last synced at: 11 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

jonathanfavorite/RAGamuffin

A lightweight, cross-platform .NET library for building RAG (Retrieval-Augmented Generation) pipelines with local embedding models and SQLite vector storage. Perfect for developers who need privacy-focused, offline-capable document search and AI-powered question answering without external API dependencies.

Language: C# - Size: 6.75 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

bneweling/neuronode

๐Ÿง  Neuronode - Enterprise-grade Knowledge Management System with LiteLLM, Neo4j, and Vector Search. AI-powered document processing, intelligent relationship discovery, and advanced query orchestration.

Language: Python - Size: 4.27 MB - Last synced at: 13 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

mhashas/financeQA

financeQA is a modular Retrieval-Augmented Generation (RAG) system for finance question answering. It features document preprocessing, image and table extraction, vector database indexing, and OpenAI-powered chat interfaces, designed for robust financial data analysis and evaluation.

Language: Python - Size: 150 KB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

GovindKurapati/dev_docs_chat

RAG-powered document/url Q&A system with ChromaDB + Groq LLM. Upload docs, ingest URLs, get AI answers.

Language: Python - Size: 801 KB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

rijwan10/rf001-gas-report-formatter-pro

This repository contains a report generation system that leverages the Claude API for natural language processing, allowing users to create professional reports efficiently. Explore the project to see how it streamlines report creation while maintaining high-quality standards. ๐Ÿ› ๏ธ๐Ÿ“Š

Language: JavaScript - Size: 33.2 KB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

syedaliwaqar12/Resume-Parser

๐Ÿš€ A beautiful, production-ready web app that extracts structured data from PDF resumes using AI and NLP. Built with React + TypeScript + FastAPI.

Language: Python - Size: 53.7 KB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 0 - Forks: 0

chatfin-tara/Chatfin

AI-powered finance automation platform for reconciliation, compliance, and intelligent data operationsโ€”integrated with Oracle NetSuite.

Size: 0 Bytes - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 0 - Forks: 0

bylickilabs/pdfAnalyzer

PDF Analyzer** ist ein effizientes Python-Tool zur automatischen Analyse von PDF-Dokumenten.

Language: Python - Size: 0 Bytes - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0

Marbrgr/DocProc

AI-Powered processing platform with RAG-based Q&A capabilities

Size: 417 KB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 0 - Forks: 0

rgianordoli/rgianordoli

Architekturรผbersicht und Dokumentation eines modularen Systems zur automatisierten Dokumentverarbeitung.

Size: 167 KB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 0 - Forks: 0

kaptinka/-GiGTakaful-AI-Insurance

Advanced AI fraud detection for Takaful motor insurance claims. Automate analysis of police reports and estimates with OCR and real-time analytics. ๐Ÿš€๐Ÿ’ป

Size: 4.88 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

nsourlos/OCR_and_RAG

Tests of OCR and RAG with LLMs

Language: Jupyter Notebook - Size: 21.5 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

e-candeloro/Credem_Hack_2025

AI-powered document processing pipeline for Credem Hackathon 2025. Leverages Google Cloud AI services to intelligently extract, classify, and process HR documents through a robust ETL pipeline.

Language: Jupyter Notebook - Size: 13.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0