An open API service providing repository metadata for many open source software ecosystems.

Topic: "document-extraction"

DocumindHQ/documind

Open-source platform for extracting structured data from documents using AI.

Language: JavaScript - Size: 960 KB - Last synced at: 1 day ago - Pushed at: about 2 months ago - Stars: 1,295 - Forks: 45

harishdeivanayagam/rowfill

Open-source unstructured data (PDFs, Images, Audiofiles) processing platform built for knowledge workers

Language: TypeScript - Size: 1.2 MB - Last synced at: 2 days ago - Pushed at: about 1 month ago - Stars: 275 - Forks: 14

Xyntopia/pydoxtools

Effortlessly extract information from unstructured data with this library, utilizing advanced AI techniques. Compose AI in customizable pipelines and diverse sources for your projects.

Language: Python - Size: 13.6 MB - Last synced at: about 9 hours ago - Pushed at: 8 months ago - Stars: 81 - Forks: 12

konfuzio-ai/konfuzio-sdk

Run OCR, extract information from documents and classify them. In addition, annotate documents and build custom NLP and computer vision models tailored for your specific use cases. Find examples with code in our Tutorials section of dev.konfuzio.com and get inspiration from Use Cases section of our blog: https://konfuzio.com/en/category/marketplace

Language: Jupyter Notebook - Size: 81.5 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 61 - Forks: 9

FantDing/Image-document-extract-and-correction

数字图像课程大作业,实现图片中文档提取与矫正。整体思路是通过hough变换检测出直线,进而得到角点,最后经过投影变换,进行矫正。整个项目只用到了opencv的IO操作(包括手写卷积,hough哈夫变换,投影变换等等)

Language: Python - Size: 789 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 55 - Forks: 17

alephdata/ingest-file

Ingestors extract the contents of mixed unstructured documents into structured (followthemoney) data.

Language: Python - Size: 67 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 52 - Forks: 26

Tammilore/ai-contract-analyzer

AI-powered contract analysis tool

Language: TypeScript - Size: 133 KB - Last synced at: 21 days ago - Pushed at: 4 months ago - Stars: 9 - Forks: 1

ryanmcdonough/lexplore

Tool to allow extraction of data from legal documents

Language: Python - Size: 169 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 8 - Forks: 0

jamesmcroft/ai-document-data-extraction-evaluation

This project demonstrates how to evaluate the use of LLMs and SLMs for extracting structured data from documents using .NET

Language: C# - Size: 1.92 MB - Last synced at: 7 days ago - Pushed at: 8 months ago - Stars: 6 - Forks: 2

jamesmcroft/azure-ai-document-pipeline-python-sample

Python-based Durable Functions accelerator for building intelligent document processing pipelines with Azure AI Services on Azure Container Apps

Language: Python - Size: 701 KB - Last synced at: 7 days ago - Pushed at: 21 days ago - Stars: 3 - Forks: 4

jamesmcroft/document-data-extraction-prompt-flow-evaluation

This sample demonstrates how to use GPT-4o with Vision to extract structured JSON data from PDF documents and evaluate them with Azure AI Studio and Prompt Flow

Language: Bicep - Size: 1.17 MB - Last synced at: 7 days ago - Pushed at: 7 months ago - Stars: 3 - Forks: 3

dev-luckymhz/AIVisionText-invoice-OCR-typescript

AIVisionText is an advanced document analysis platform that harnesses the power of artificial intelligence (AI) to revolutionize the way you manage and extract insights from documents.

Language: TypeScript - Size: 104 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 2

dashroshan/data-extractor

Extract and download key-value pairs, tables, and paragraphs from your scanned pdf, jpg, and png documents as CSV files.

Language: JavaScript - Size: 503 KB - Last synced at: 15 days ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

jamesmcroft/azure-ai-document-pipeline-sample

.NET sample project for building a scalable document data extraction pipeline with containerized Durable Functions and Azure AI Services on Azure Container Apps.

Language: C# - Size: 647 KB - Last synced at: about 2 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

sensible-hq/tutorial-pdf-to-excel

Converts a PDF file to Excel.

Language: Python - Size: 4.08 MB - Last synced at: 18 days ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 1

jojolebarjos/pdf2htmlEX-webservice

pdf2htmlEX as a webservice

Language: Dockerfile - Size: 4.88 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

rajsinghparihar/data-detective

An app that leverages LLMs to process documents, extract relevant information and provide a summary specific to financial data

Language: Python - Size: 9.49 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

JunoLeong/DocExtractRAG

DocExtractRAG is a Retrieval-Augmented Generation (RAG) system that combines the power of large language models (LLMs) with document retrieval to provide insightful responses based on academic or other types of documents. The system utilizes the Zephyr-7B-beta model for text generation; BAAI/bge-large-en for document embeddings.

Language: Python - Size: 202 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

subratamondal1/document-extraction

Document extraction from pdfs and images with OpenCV.

Language: Python - Size: 6.96 MB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

Ritesh1137/langchain-doc-intelligence-loader

Customized LangChain Azure Document Intelligence loader for table extraction and summarization

Language: Python - Size: 454 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

ThinkOrFaust/QuickZonalOCR

Welcome to QuickZonalOCR! Right now, it's a work in progress, but the goal is to make creating your own key-value document extraction models fairly easily. Think of it as your friendly tool-in-the-making for smart, hassle-free ML model creation. Stay tuned for updates!

Language: HTML - Size: 75.7 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

hreikin/pdf-toolbox

Extract content from PDF's and convert or create new documents from the content in multiple output formats.

Language: Python - Size: 7.57 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

dataiku/dss-plugin-nlp-extraction

WORK IN PROGRESS - Dataiku DSS plugin to extract text data from documents

Language: Makefile - Size: 13.7 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

jojolebarjos/poppler

Copy of Poppler (as of 2018-12-01), just in case. See https://poppler.freedesktop.org/

Language: C++ - Size: 6.39 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

idstack/extractor

Extractor API for document extraction with the use of DocParser

Language: Java - Size: 129 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

jojolebarjos/pdf2htmlEX Fork of pdf2htmlEX/pdf2htmlEX

Fork of modified version of pdf2htmlEX, just in case. See https://github.com/pdf2htmlEX/pdf2htmlEX

Language: HTML - Size: 131 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0