An open API service providing repository metadata for many open source software ecosystems.

Topic: "pdf-to-json"

docling-project/docling

Get your documents ready for gen AI

Language: Python - Size: 160 MB - Last synced at: 6 days ago - Pushed at: 14 days ago - Stars: 48,547 - Forks: 3,388

Unstructured-IO/unstructured

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

Language: HTML - Size: 194 MB - Last synced at: 4 days ago - Pushed at: 7 days ago - Stars: 13,532 - Forks: 1,117

run-llama/llama_cloud_services

Knowledge Agents and Management in the Cloud

Language: TypeScript - Size: 84.4 MB - Last synced at: 5 days ago - Pushed at: 26 days ago - Stars: 4,226 - Forks: 467

NanoNets/docstrange

Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with intelligent structured data extraction and advanced OCR.

Language: Python - Size: 351 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 975 - Forks: 88

opendataloader-project/opendataloader-pdf

PDF Parsing for RAG — Convert to Markdown & JSON, Fast, Local, No GPU

Language: Java - Size: 78.4 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 814 - Forks: 43

awesome-yasin/PDF-Verse

PDF Verse is a powerful web based PDF Editor with tools for editing, converting, and manipulating PDFs. Merge, compress, add or remove pages, or extract text using OCR technology. Convert PDF to DOC, Excel, PPT, JPG, PNG, Text and many more format as well and vice versa. PDF Verse also has user-friendly interface and wide range of features as well

Language: JavaScript - Size: 53 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 90 - Forks: 34

oidlabs-com/Lexoid

Multimodal document parser for high quality data understanding and extraction

Language: Python - Size: 50.7 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 89 - Forks: 11

electrovir/statement-parser

Parse bank and credit card statements

Language: TypeScript - Size: 948 KB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 36 - Forks: 9

HoangTran0410/saoke_yagi

Sao kê của Mặt Trận Tổ Quốc Việt Nam (MTTQ) về việc hỗ trợ đồng bào sau bão Yagi

Language: JavaScript - Size: 392 MB - Last synced at: 6 months ago - Pushed at: over 1 year ago - Stars: 26 - Forks: 7

NanoNets/ocr-python

OCR library to extract text & tables from PDF files and images. Convert any image or PDF to CSV / TXT / JSON / Searchable PDF.

Language: Jupyter Notebook - Size: 5.52 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 24 - Forks: 4

graphlit/graphlit

Graphlit Platform

Size: 2.93 KB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 23 - Forks: 1

graphlit/graphlit-client-python

Python client library for Graphlit Platform

Language: Python - Size: 3.28 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 18 - Forks: 2

docling-project/docling4j

Docling4j brings the functionalities of Docling in document understanding to Java® projects

Language: Java - Size: 32.2 KB - Last synced at: 4 months ago - Pushed at: 9 months ago - Stars: 16 - Forks: 0

Clearedge-AI/clearedge

Build a RAG preprocessing pipeline

Language: Jupyter Notebook - Size: 24.7 MB - Last synced at: 1 day ago - Pushed at: almost 2 years ago - Stars: 12 - Forks: 0

LiterateInk/PDFInspector

A cute PDF parser that gives position of elements for inspection purposes.

Language: TypeScript - Size: 112 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 9 - Forks: 0

clarekang/form-pdf2json

NodeJS library to convert JSON to PDF or vice versa

Language: JavaScript - Size: 2.67 MB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 2

hparreao/doclingconverter

Quick way to convert files (PDF, DOCX, HTML, PPTX, Images) to (MD, JSON, YAML) using Docling and Streamlit

Language: Python - Size: 11.7 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 8 - Forks: 1

bytescout/pdf-extractor-sdk-samples

ByteScout PDF Extractor SDK source code samples

Language: C# - Size: 27.5 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 8 - Forks: 5

graphlit/graphlit-client-typescript

TypeScript client for Graphlit Platform

Language: TypeScript - Size: 5.5 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 5 - Forks: 2

tahaygun/PDF-to-MongoDB

This project for converting books from PDF to Proper JSON objects by separating title and content. After you take your output, you can insert your JSON file in the database easily.

Language: JavaScript - Size: 1.68 MB - Last synced at: almost 3 years ago - Pushed at: over 7 years ago - Stars: 4 - Forks: 0

Rushi-Balapure/pdf_2_json_extractor

A high-performance Python library for extracting structured content from PDF documents with layout-aware text extraction. pdf_to_json preserves document structure including headings (H1-H6) and body text, outputting clean JSON format.

Language: Python - Size: 1.71 MB - Last synced at: about 21 hours ago - Pushed at: about 22 hours ago - Stars: 2 - Forks: 1

Aniket965/ipuresult-cli

🛠️ ipuresult-cli is tool for creating json files from pdf result files 📚 of GGSIPU Results

Language: JavaScript - Size: 7.81 KB - Last synced at: 5 months ago - Pushed at: almost 6 years ago - Stars: 2 - Forks: 0

vandoagency/Docs-to-JSON-Converter

Vando Agency güvencesiyle; Excel, Word, CSV, TXT ve Google Docs dosyalarınızı geliştiriciler için işlenebilir JSON formatına dönüştürün.

Language: TypeScript - Size: 102 KB - Last synced at: 6 days ago - Pushed at: 9 days ago - Stars: 1 - Forks: 0

Denesepro/question-extractor

An end-to-end automation tool to extract quiz questions from PDF files using Gemini AI and automatically upload them to biazmoon.com with Selenium.

Language: Python - Size: 27.3 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

nordinz7/maybankpdf2json-cli

Convert MayBank email statement delivery to CSV or JSON format via CLI

Language: Python - Size: 28.3 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

aidayang/Marker-OneClick

PDF转Markdown软件Marker免安装一键启动整合包

Size: 25.4 KB - Last synced at: 7 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

ERIK2012MIAO/chunk-data

📦 Split buffers and streams into smaller chunks for smooth HTTP uploads and accurate progress tracking.

Language: JavaScript - Size: 1.3 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

msaleh1888/azure-serverless-invoice-extraction

Serverless invoice extraction API using Azure Document Intelligence and Azure Functions. Upload a PDF invoice and receive normalized JSON output including line items, totals, dates, and vendor details.

Language: Python - Size: 175 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

khushikumarigupta14/pdf-mcq-extractor

PDF MCQ Extractor – Quickly extract multiple-choice questions from PDFs and export them as structured JSON. Perfect for educators, students, and study apps.

Language: EJS - Size: 1010 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

AI-Enginner/Intelligent-Document-Processing

AI-powered data extraction tool that converts PDFs, images, and scanned documents into structured data in seconds.

Size: 5.86 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Related Topics