Topic: "pdf-to-json"
docling-project/docling
Get your documents ready for gen AI
Language: Python - Size: 160 MB - Last synced at: 6 days ago - Pushed at: 14 days ago - Stars: 48,547 - Forks: 3,388
Unstructured-IO/unstructured
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
Language: HTML - Size: 194 MB - Last synced at: 4 days ago - Pushed at: 7 days ago - Stars: 13,532 - Forks: 1,117
run-llama/llama_cloud_services
Knowledge Agents and Management in the Cloud
Language: TypeScript - Size: 84.4 MB - Last synced at: 5 days ago - Pushed at: 26 days ago - Stars: 4,226 - Forks: 467
NanoNets/docstrange
Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with intelligent structured data extraction and advanced OCR.
Language: Python - Size: 351 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 975 - Forks: 88
opendataloader-project/opendataloader-pdf
PDF Parsing for RAG — Convert to Markdown & JSON, Fast, Local, No GPU
Language: Java - Size: 78.4 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 814 - Forks: 43
awesome-yasin/PDF-Verse
PDF Verse is a powerful web based PDF Editor with tools for editing, converting, and manipulating PDFs. Merge, compress, add or remove pages, or extract text using OCR technology. Convert PDF to DOC, Excel, PPT, JPG, PNG, Text and many more format as well and vice versa. PDF Verse also has user-friendly interface and wide range of features as well
Language: JavaScript - Size: 53 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 90 - Forks: 34
oidlabs-com/Lexoid
Multimodal document parser for high quality data understanding and extraction
Language: Python - Size: 50.7 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 89 - Forks: 11
electrovir/statement-parser
Parse bank and credit card statements
Language: TypeScript - Size: 948 KB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 36 - Forks: 9
HoangTran0410/saoke_yagi
Sao kê của Mặt Trận Tổ Quốc Việt Nam (MTTQ) về việc hỗ trợ đồng bào sau bão Yagi
Language: JavaScript - Size: 392 MB - Last synced at: 6 months ago - Pushed at: over 1 year ago - Stars: 26 - Forks: 7
NanoNets/ocr-python
OCR library to extract text & tables from PDF files and images. Convert any image or PDF to CSV / TXT / JSON / Searchable PDF.
Language: Jupyter Notebook - Size: 5.52 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 24 - Forks: 4
graphlit/graphlit
Graphlit Platform
Size: 2.93 KB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 23 - Forks: 1
graphlit/graphlit-client-python
Python client library for Graphlit Platform
Language: Python - Size: 3.28 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 18 - Forks: 2
docling-project/docling4j
Docling4j brings the functionalities of Docling in document understanding to Java® projects
Language: Java - Size: 32.2 KB - Last synced at: 4 months ago - Pushed at: 9 months ago - Stars: 16 - Forks: 0
Clearedge-AI/clearedge
Build a RAG preprocessing pipeline
Language: Jupyter Notebook - Size: 24.7 MB - Last synced at: 1 day ago - Pushed at: almost 2 years ago - Stars: 12 - Forks: 0
LiterateInk/PDFInspector
A cute PDF parser that gives position of elements for inspection purposes.
Language: TypeScript - Size: 112 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 9 - Forks: 0
clarekang/form-pdf2json
NodeJS library to convert JSON to PDF or vice versa
Language: JavaScript - Size: 2.67 MB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 2
hparreao/doclingconverter
Quick way to convert files (PDF, DOCX, HTML, PPTX, Images) to (MD, JSON, YAML) using Docling and Streamlit
Language: Python - Size: 11.7 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 8 - Forks: 1
bytescout/pdf-extractor-sdk-samples
ByteScout PDF Extractor SDK source code samples
Language: C# - Size: 27.5 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 8 - Forks: 5
graphlit/graphlit-client-typescript
TypeScript client for Graphlit Platform
Language: TypeScript - Size: 5.5 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 5 - Forks: 2
tahaygun/PDF-to-MongoDB
This project for converting books from PDF to Proper JSON objects by separating title and content. After you take your output, you can insert your JSON file in the database easily.
Language: JavaScript - Size: 1.68 MB - Last synced at: almost 3 years ago - Pushed at: over 7 years ago - Stars: 4 - Forks: 0
Rushi-Balapure/pdf_2_json_extractor
A high-performance Python library for extracting structured content from PDF documents with layout-aware text extraction. pdf_to_json preserves document structure including headings (H1-H6) and body text, outputting clean JSON format.
Language: Python - Size: 1.71 MB - Last synced at: about 21 hours ago - Pushed at: about 22 hours ago - Stars: 2 - Forks: 1
Aniket965/ipuresult-cli
🛠️ ipuresult-cli is tool for creating json files from pdf result files 📚 of GGSIPU Results
Language: JavaScript - Size: 7.81 KB - Last synced at: 5 months ago - Pushed at: almost 6 years ago - Stars: 2 - Forks: 0
vandoagency/Docs-to-JSON-Converter
Vando Agency güvencesiyle; Excel, Word, CSV, TXT ve Google Docs dosyalarınızı geliştiriciler için işlenebilir JSON formatına dönüştürün.
Language: TypeScript - Size: 102 KB - Last synced at: 6 days ago - Pushed at: 9 days ago - Stars: 1 - Forks: 0
Denesepro/question-extractor
An end-to-end automation tool to extract quiz questions from PDF files using Gemini AI and automatically upload them to biazmoon.com with Selenium.
Language: Python - Size: 27.3 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0
nordinz7/maybankpdf2json-cli
Convert MayBank email statement delivery to CSV or JSON format via CLI
Language: Python - Size: 28.3 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0
aidayang/Marker-OneClick
PDF转Markdown软件Marker免安装一键启动整合包
Size: 25.4 KB - Last synced at: 7 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0
ERIK2012MIAO/chunk-data
📦 Split buffers and streams into smaller chunks for smooth HTTP uploads and accurate progress tracking.
Language: JavaScript - Size: 1.3 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0
msaleh1888/azure-serverless-invoice-extraction
Serverless invoice extraction API using Azure Document Intelligence and Azure Functions. Upload a PDF invoice and receive normalized JSON output including line items, totals, dates, and vendor details.
Language: Python - Size: 175 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
khushikumarigupta14/pdf-mcq-extractor
PDF MCQ Extractor – Quickly extract multiple-choice questions from PDFs and export them as structured JSON. Perfect for educators, students, and study apps.
Language: EJS - Size: 1010 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0
AI-Enginner/Intelligent-Document-Processing
AI-powered data extraction tool that converts PDFs, images, and scanned documents into structured data in seconds.
Size: 5.86 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0