GitHub topics: pdf-parser
Aumlo123/pdfdoom
DOOM in a PDF (as ascii art)
Size: 1000 Bytes - Last synced at: about 15 hours ago - Pushed at: about 17 hours ago - Stars: 1 - Forks: 0

iamarunbrahma/vision-parse
Parse PDFs into markdown using Vision LLMs
Language: Python - Size: 299 KB - Last synced at: about 17 hours ago - Pushed at: about 19 hours ago - Stars: 426 - Forks: 58

Stranger123444/u
An interactive command-line tool designed to quickly navigate directories and perform various file operations efficiently. Its simple syntax and intuitive commands make it a favorite among developers for streamlining workflow tasks.
Size: 1000 Bytes - Last synced at: about 22 hours ago - Pushed at: about 23 hours ago - Stars: 0 - Forks: 0

byerlikaya/SmartRAG
⚡ Production-ready .NET Standard 2.0/2.1 RAG library with 🤖 multi-AI provider support, 🏢 enterprise vector storage, and 📄 intelligent document processing. 🌍 Cross-platform compatible.
Language: C# - Size: 931 KB - Last synced at: about 17 hours ago - Pushed at: 1 day ago - Stars: 3 - Forks: 1

nihal-soni/summerify
Ai tool for summarizing -pdf into short notes
Language: TypeScript - Size: 150 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

opendatalab/MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
Language: Python - Size: 129 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 43,238 - Forks: 3,561

LianjiaTech/bella-domify
文档解析(Document Parser),支持 PDF、TXT、DOC、DOCX、Markdown 等文件格式,高效提取与解析内容,生成标准文档树结构。内置 PDF Parser、Text Parser、Word Parser,助力 RAG、知识库、全文检索等智能应用。
Language: Python - Size: 32.1 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 36 - Forks: 5

privateai-com/docviz
Advanced document contents extraction with multiple output formats
Language: Python - Size: 121 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 3 - Forks: 0

CASParser/cas-parser-python
CAS Parser allows you to track Consolidated Account Statement (CAS PDF) portfolios from NSDL, CDSL, CAMS, KFintech - CAS Parser API Client - Python
Language: Python - Size: 157 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

ItsAJ1005/typeface-finance-app
Track, visualize, and manage your finances with smart receipt scanning.
Language: JavaScript - Size: 10.7 MB - Last synced at: about 4 hours ago - Pushed at: about 6 hours ago - Stars: 0 - Forks: 0

NanoNets/docstrange
Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with intelligent structured data extraction and advanced OCR.
Language: Python - Size: 347 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 493 - Forks: 37

py-pdf/pypdf
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
Language: Python - Size: 22.7 MB - Last synced at: 6 days ago - Pushed at: 19 days ago - Stars: 9,374 - Forks: 1,493

oidlabs-com/Lexoid
Multimodal document parser for high quality data understanding and extraction
Language: Python - Size: 47 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 79 - Forks: 8

sylphxltd/pdf-reader-mcp
An MCP server built with Node.js/TypeScript that allows AI agents to securely read PDF files (local or URL) and extract text, metadata, or page counts. Uses pdf-parse.
Language: TypeScript - Size: 1.01 MB - Last synced at: 7 days ago - Pushed at: 13 days ago - Stars: 226 - Forks: 27

drmingler/smart-llm-loader
smart-llm-loader is a lightweight yet powerful Python package that transforms any document into LLM-ready chunks. Spend less time on preprocessing headaches and more time building what matters. From RAG systems to chatbots to document Q&A, SmartLLMLoader handles the heavy lifting so you can focus on creating exceptional AI applications.
Language: Python - Size: 1.09 MB - Last synced at: 6 days ago - Pushed at: 7 months ago - Stars: 71 - Forks: 2

CASParser/cas-parser-node
CAS Parser allows you to track Consolidated Account Statement (CAS PDF) portfolios from NSDL, CDSL, CAMS, KFintech - CAS Parser API Client - NPM
Language: TypeScript - Size: 271 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

ispras/dedoc
Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser
Language: Python - Size: 240 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 589 - Forks: 44

datalogics/apdfl-vb-dotnet-samples
Adobe PDF Library Samples in Visual Basic for .NET
Language: Visual Basic .NET - Size: 176 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 4

michelcrypt4d4mus/pdfalyzer
Analyze PDFs. With colors. And Yara.
Language: YARA - Size: 94.9 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 285 - Forks: 21

codereverser/casparser
Parser for Consolidated Account Statements (CAS) generated from CAMS/Karvy/Kfintech
Language: Python - Size: 7.85 MB - Last synced at: 2 days ago - Pushed at: 3 months ago - Stars: 161 - Forks: 66

NeurosynLabs/ai-prompt-splitter
Free AI Prompt Splitter - Split large documents into chunks for ChatGPT, Claude, GPT-4. Supports PDF, TXT, MD files. Smart token counting & overlap control.
Size: 50.8 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 1 - Forks: 0

Besthope-Official/predoc
Preprocess document service for RAG (Retriveal Augumented Generation)
Language: Python - Size: 122 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 1 - Forks: 1

Sourik-10/PrismAI
QuickAI is a full-stack AI web application built with a modular client–server architecture. The project is primarily developed in JavaScript, with the frontend and backend kept in separate folders for better structure and scalability. It leverages modern web technologies and integrates AI-powered features to deliver intelligent interactions.
Language: JavaScript - Size: 14.1 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

titipata/scipdf_parser
Python PDF parser for scientific publications: content and figures
Language: Python - Size: 29.2 MB - Last synced at: 8 days ago - Pushed at: over 1 year ago - Stars: 423 - Forks: 67

yobix-ai/extractous
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Language: Rust - Size: 2.88 MB - Last synced at: 15 days ago - Pushed at: 9 months ago - Stars: 1,217 - Forks: 56

CASParser/cas-parser-php
CAS Parser allows you to track Consolidated Account Statement (CAS PDF) portfolios from NSDL, CDSL, CAMS, KFintech - CAS Parser API Client - PHP
Language: PHP - Size: 164 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

CASParser/cas-parser-go
CAS Parser allows you to track Consolidated Account Statement (CAS PDF) portfolios from NSDL, CDSL, CAMS, KFintech - CAS Parser API Client - GO
Language: Go - Size: 148 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

kelvinleandro/ufc-ira-calculator
Aplicação com Streamlit que calcula o Índice de Rendimento Acadêmico (IRA)
Language: Python - Size: 1020 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

datalogics/apdfl-csharp-dotnet-samples
Sample code for the Datalogics .NET interface of the Adobe PDF Library
Language: C# - Size: 315 KB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 8 - Forks: 10

saviobatista/vitae
AI-powered résumé transformer: match your CV to any job and export in LaTeX PDF.
Language: TypeScript - Size: 308 KB - Last synced at: 8 days ago - Pushed at: 11 days ago - Stars: 1 - Forks: 1

dromara/yft-design
yft-design is a powerful, visually stunning online design tool built with Vue3, fabric.js, and Element Plus. 基于fabric.js的开源版【稿定设计】。一款美观且功能强大的在线设计工具,具备海报设计和图片编辑功能。适用于多种场景,如海报生成、电商产品图制作、文章长图设计、视频/公众号封面编辑等 。
Language: TypeScript - Size: 50.8 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 1,380 - Forks: 279

datalogics/apdfl-kotlin-samples
Adobe PDF Library Samples in Kotlin
Language: Kotlin - Size: 146 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 7

bytedance/Dolphin
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
Language: Python - Size: 10.9 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 5,450 - Forks: 433

PSHACKERZ/PDFQuery-AI
PDFQuery AI is an intelligent PDF conversation companion built using Flask and Python. Upload a PDF to extract key insights, generate detailed summaries, or explore specific topics interactively. Powered by the Gemini Starter API for natural language understanding, this tool simplifies complex documents into actionable information.
Language: HTML - Size: 54.7 KB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 0 - Forks: 0

aescarias/pdfnaut
A Python library for exploring PDFs with ease.
Language: Python - Size: 717 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

BitMiracle/Docotic.Pdf.Samples
C# and VB.NET samples for Docotic.Pdf library
Language: Visual Basic .NET - Size: 53.8 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 78 - Forks: 39

seinecle/nocodefunctions-io
io for nocodefunctions: csv, txt, pdf, and xlsx so far
Language: Java - Size: 273 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

PSPDFKit/nutrient-pdf-mcp-server
A powerful Model Context Protocol server for LLM-driven PDF document analysis and exploration
Language: Python - Size: 52.7 KB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

genbs/poste-italiane-parser
A Python tool to parse PDF statements from Poste Italiane (Postepay, BancoPosta) and extract data as structured JSON.
Language: Python - Size: 20.5 KB - Last synced at: 13 days ago - Pushed at: about 1 month ago - Stars: 50 - Forks: 1

SouravUpadhyay7/Morvs_Chat_Bot
🤖 MORVS AI - An intelligent chat interface powered by Groq's LLaMA 3 model with PDF processing capabilities. Built with Next.js, React, TypeScript, and modern UI components.
Language: TypeScript - Size: 43 KB - Last synced at: 20 days ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

datalogics/apdfl-java-maven-samples
Sample code for the Datalogics Java interface of the Adobe PDF Library setup to build with Maven
Language: Java - Size: 1.2 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 5 - Forks: 12

ShantiKumariGautam/IDassure
IDAssure is a face-matching-based identity verification system that ensures secure and reliable user authentication. It’s built for seamless integration into platforms that require trust and visual identity validation.
Language: JavaScript - Size: 6.95 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 1

datalogics/apdfl-cplusplus-samples
Sample code for the Datalogics C++ interface of the Adobe PDF Library
Language: C++ - Size: 35.3 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 9 - Forks: 9

datalogics/apdfl-csharp-dotnet-framework-samples
Sample code for the Datalogics .NET Framework interface of the Adobe PDF Library
Language: C# - Size: 564 KB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 3 - Forks: 9

arman61-hub/GenCraftAI
✨ GenCraftAI — An AI-powered SaaS platform to ✍️ generate blogs, 📰 craft article titles, 🧾 review resumes, and 🖼️ create visuals — all in one creative hub.
Language: JavaScript - Size: 1.47 MB - Last synced at: 26 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

EsanaKomudi/Auto-Contract-Interpreter
Auto Contract Interpreter is a Python-Tkinter app that analyzes contract PDFs using Gemini 1.5 Flash, extracts clauses, risks, and insights, supports chat-based queries, and includes text-to-speech—ideal for legal reviewers, freelancers, students, and AI learners.
Language: Python - Size: 10.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

smrutiranjan1132001/ai-resume-screener
AI based resume screeneing solution🧠
Language: Python - Size: 194 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

Sahaj33-op/SkillWise
🎯 SkillWise is an AI-powered learning path generator that transforms your resume into a personalized 6-month roadmap — complete with curated courses, project ideas, and tech stack recommendations. Built with Gemini 1.5 Flash and Streamlit.
Language: Python - Size: 3.22 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

k16shikano/hpdft
tools to poke pdf using haskell
Language: Haskell - Size: 403 KB - Last synced at: 6 days ago - Pushed at: 7 months ago - Stars: 44 - Forks: 0

per5ect/JobFinder-Backend
Back-End for JobFinder web application
Language: Java - Size: 626 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

PeterMosmans/apdfhelper
Fix links in PDF files, rewrite links, extract text annotations, remove pages
Language: Python - Size: 112 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

rafenden/pdf-menu-extractor
Library for extracting menu items from restaurant PDF menus.
Language: JavaScript - Size: 2.45 MB - Last synced at: 11 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

dunso/pdf-parser
Convert PDF content and layout information with pdf.js
Language: JavaScript - Size: 2.18 MB - Last synced at: 22 days ago - Pushed at: almost 6 years ago - Stars: 23 - Forks: 7

code4daniel/pdf-parser-service
This is a Flask-based microservice that extracts course cutoff data from university admission PDFs using pdfplumber.
Language: Python - Size: 164 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

tarfin-labs/easy-pdf
Pdf wrapper for laravel
Language: PHP - Size: 204 KB - Last synced at: 9 days ago - Pushed at: 6 months ago - Stars: 17 - Forks: 3

qwaszxerdfcv12344/SkillWise
SkillWise helps you create a tailored learning path based on your resume. Discover free courses, project ideas, and a career plan to boost your skills. 🛠️👨💻
Language: Python - Size: 2.14 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

syedaliwaqar12/Resume-Parser
🚀 A beautiful, production-ready web app that extracts structured data from PDF resumes using AI and NLP. Built with React + TypeScript + FastAPI.
Language: Python - Size: 53.7 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

siddharthparakh1105/invoice-Scanner
A OCR based python application that uses gemini api key and extract information from invoice which are in the form of pdf and then extract them to excel file
Language: Python - Size: 20.5 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

J-sephB-lt-n/pdf-bank-statement-parser
Tool for converting First National Bank (FNB) bank statement PDFs into useful structured data
Language: Python - Size: 65.4 KB - Last synced at: 7 days ago - Pushed at: 10 months ago - Stars: 4 - Forks: 3

s2bd/bracu-cgpa-calculator
CGPA calculator for BRAC University, supporting PDF uploading and real-time GPA auto-calculation.
Language: JavaScript - Size: 13.7 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

vinayaksandilya/NoteBook-Front-End
Turn any PDF into a structured online course with modules, summaries, and key takeaways — powered by Node.js, MySQL, and AI models like GPT-4 & Claude.
Language: TypeScript - Size: 126 KB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

vinayaksandilya/NoteBook-Backend
Turn any PDF into a structured online course with modules, summaries, and key takeaways — powered by Node.js, MySQL, and AI models like GPT-4 & Claude.
Language: JavaScript - Size: 66.4 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

SimpleApp/PDFParser
Swift PDFParser for PDF parsing and text mining. Includes a TrueType font parser
Language: Swift - Size: 146 KB - Last synced at: about 2 months ago - Pushed at: about 6 years ago - Stars: 42 - Forks: 11

sarabjit1003/resume-tracker
A smart resume screening tool that matches resumes to job descriptions using Streamlit and Python.
Language: Python - Size: 2.98 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

kurtnettle/bubt-routinepy
An unofficial Python wrapper of the BUBT Routine API + a robust web scraper and PDF extractor for getting routine data.
Language: Python - Size: 138 KB - Last synced at: 21 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

dewanmukto/bracu-cgpa-calculator
CGPA calculator for BRAC University, supporting PDF uploading and real-time GPA auto-calculation.
Language: JavaScript - Size: 18.6 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

code-418-dpr/SportHub-parser
Парсер PDF-файла ЕКП Минспорта РФ для проекта SportHub
Language: Python - Size: 4.1 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

FayazK/Document-Metadata-Extractor
A Python tool that uses Google's Gemini AI to automatically extract structured metadata from PDF and DOCX documents, saving results to Excel for easy analysis and organizing raw responses as JSON files.
Language: Python - Size: 11.7 KB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

lazyFrogLOL/llmdocparser
A package for parsing PDFs and analyzing their content using LLMs.
Language: Python - Size: 1.21 MB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 271 - Forks: 9

Polyte/OMS_OCR
This is an image/pdf OCR reader. Use it to extract text from either and image or PDF file, this project uses Tesseractjs & PDF-Parser to do OCR.
Language: TypeScript - Size: 69.9 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

ridi/content-parser
Content data parser for Ridibooks services
Language: JavaScript - Size: 49.2 MB - Last synced at: 15 days ago - Pushed at: about 2 years ago - Stars: 25 - Forks: 7

adrienjoly/HsbcStatementParser
Transforms PDF bank statements from HSBC into a list of operations in JSON or TSV format.
Language: JavaScript - Size: 21.5 KB - Last synced at: about 2 months ago - Pushed at: over 9 years ago - Stars: 18 - Forks: 7

aidayang/MinerU-OneClick
MinerU免安装部署一键启动整合包
Size: 49.8 KB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 10 - Forks: 2

adithya-s-k/marker-api
Easily deployable 🚀 API to convert PDF to markdown quickly with high accuracy.
Language: Python - Size: 35 MB - Last synced at: 4 months ago - Pushed at: 11 months ago - Stars: 854 - Forks: 96

drmingler/docling-api
Easily deployable and scalable backend server that efficiently converts various document formats (pdf, docx, pptx, html, images, etc) into Markdown. With support for both CPU and GPU processing, it is Ideal for large-scale workflows, it offers text/table extraction, OCR, and batch processing with sync/async endpoints.
Language: Python - Size: 3.48 MB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 585 - Forks: 58

dadicharan/Log-Analyzer
Log Analyzer with AI is a Streamlit-based tool for AI-powered log analysis. It supports CSV log uploads, data visualization (Plotly & Matplotlib), and anomaly detection using DeepSeek LLM via Ollama API. Users can explore logs, detect patterns, and gain AI-driven insights. 🚀 Python, Pandas, Streamlit, AI
Language: Python - Size: 13.7 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

diegoabeltran16/OpenPages-pipeline
Open-source tool for turning technical documents into AI-ready formats. Built for better access to knowledge.
Language: Python - Size: 1.78 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Stravah/eosin
Custom Bank Statement Parsing based on pure text positioning.
Language: Python - Size: 5.22 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 3 - Forks: 1

chinmaymisra/personal-finance-tracker
Upload Axis Bank statements as PDFs, automatically parse transactions, and view them cleanly in a modern UI. Handles invalid files and non-supported banks gracefully. Built using React (Vite) and FastAPI.
Language: Python - Size: 143 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

nlitsme/pyPdfCrack
Investigation in PDF encryption
Language: Python - Size: 34.2 KB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 17 - Forks: 7

ev0clu/pdf-ai-saas
Full stack (Next.js) PDF AI SaaS App
Language: TypeScript - Size: 830 KB - Last synced at: 28 days ago - Pushed at: 9 months ago - Stars: 4 - Forks: 2

VishwaGauravIn/pdf-parser-client-side
A lightweight easy to use package to parse text from PDF files on client side without any server dependency.
Language: TypeScript - Size: 26.4 KB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 12 - Forks: 0

sankeer28/PDF-Searcher
Live website to parse multiple PDFs using PDF.js to find matching text
Language: JavaScript - Size: 29.3 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

luccaHirae/invoice-extract-server
API para extração de dados de faturas
Language: TypeScript - Size: 75.2 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

ashutoshvarma/pyxpdf
Fast and memory-efficient Python PDF Parser based on xpdf sources
Language: Cython - Size: 12.2 MB - Last synced at: 22 days ago - Pushed at: over 1 year ago - Stars: 42 - Forks: 17

aleff-github/PDF-Parser-VirusTotal-Based 📦
PDF Parser based on VirusTotal API
Language: Python - Size: 709 KB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 0

eli64s/pdflex
CLI for merging PDF contexts.
Language: Python - Size: 465 KB - Last synced at: 19 days ago - Pushed at: 6 months ago - Stars: 3 - Forks: 0

cuiyuheng/docling Fork of docling-project/docling
🥚 Transform PDF to JSON or Markdown with ease and speed 🐣
Size: 28.5 MB - Last synced at: 6 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

cuiyuheng/olmocr Fork of allenai/olmocr
Toolkit for linearizing PDFs for LLM datasets/training
Size: 30.9 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

minjun0219/welstory-menu-pdf-parser 📦
웰스토리 메뉴 PDF Parser
Language: TypeScript - Size: 130 KB - Last synced at: 1 day ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 2

colin-tso/HSBC-AU-Statement-Parser
Parses PDF bank statements from HSBC Australia into MS Excel
Language: JavaScript - Size: 46.9 KB - Last synced at: 2 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 1

dills122/cardboard-crack
Web app for parsing/viewing Soccer Card Checklists
Language: JavaScript - Size: 1.3 MB - Last synced at: 6 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

judaicalink/rdf_generator
A library to generate rdf files in turtle format for Judaicalink.
Language: Python - Size: 27.3 KB - Last synced at: 20 days ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

ishaangupta-YB/nextjs-pdf-parser
Next.js template for seamless PDF parsing using pdf2json and custom drag nd drop file-uploader. Ideal for developers seeking a ready-to-use solution for PDF content extraction in their Next.js projects.
Language: TypeScript - Size: 200 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 4 - Forks: 3

AlphaTok-Singapore/PDFMathTranslate Fork of Byaidu/PDFMathTranslate
PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/Docker
Size: 51.4 MB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

sypht-team/sypht-java-client
A Java client for the Sypht API
Language: Java - Size: 108 KB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 87 - Forks: 1

sypht-team/sypht-python-client
A python client for the Sypht API
Language: Python - Size: 165 KB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 162 - Forks: 5

RiccardoSenica/pdf-text-parsing
PDF-parsing demo
Language: TypeScript - Size: 167 KB - Last synced at: 6 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Alapipapi/MinerU Fork of opendatalab/MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
Language: Python - Size: 103 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Daniel-Alvarenga/Boot Fork of VitorCarvalho67/Boot
Digital platform tailored for the educational environment, designed to facilitate the dissemination of internship opportunities and promote student engagement
Language: Vue - Size: 8.16 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 8 - Forks: 0
