An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: pdf-parser

Aumlo123/pdfdoom

DOOM in a PDF (as ascii art)

Size: 1000 Bytes - Last synced at: about 15 hours ago - Pushed at: about 17 hours ago - Stars: 1 - Forks: 0

iamarunbrahma/vision-parse

Parse PDFs into markdown using Vision LLMs

Language: Python - Size: 299 KB - Last synced at: about 17 hours ago - Pushed at: about 19 hours ago - Stars: 426 - Forks: 58

Stranger123444/u

An interactive command-line tool designed to quickly navigate directories and perform various file operations efficiently. Its simple syntax and intuitive commands make it a favorite among developers for streamlining workflow tasks.

Size: 1000 Bytes - Last synced at: about 22 hours ago - Pushed at: about 23 hours ago - Stars: 0 - Forks: 0

byerlikaya/SmartRAG

⚡ Production-ready .NET Standard 2.0/2.1 RAG library with 🤖 multi-AI provider support, 🏢 enterprise vector storage, and 📄 intelligent document processing. 🌍 Cross-platform compatible.

Language: C# - Size: 931 KB - Last synced at: about 17 hours ago - Pushed at: 1 day ago - Stars: 3 - Forks: 1

nihal-soni/summerify

Ai tool for summarizing -pdf into short notes

Language: TypeScript - Size: 150 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

opendatalab/MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

Language: Python - Size: 129 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 43,238 - Forks: 3,561

LianjiaTech/bella-domify

文档解析(Document Parser),支持 PDF、TXT、DOC、DOCX、Markdown 等文件格式,高效提取与解析内容,生成标准文档树结构。内置 PDF Parser、Text Parser、Word Parser,助力 RAG、知识库、全文检索等智能应用。

Language: Python - Size: 32.1 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 36 - Forks: 5

privateai-com/docviz

Advanced document contents extraction with multiple output formats

Language: Python - Size: 121 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 3 - Forks: 0

CASParser/cas-parser-python

CAS Parser allows you to track Consolidated Account Statement (CAS PDF) portfolios from NSDL, CDSL, CAMS, KFintech - CAS Parser API Client - Python

Language: Python - Size: 157 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

ItsAJ1005/typeface-finance-app

Track, visualize, and manage your finances with smart receipt scanning.

Language: JavaScript - Size: 10.7 MB - Last synced at: about 4 hours ago - Pushed at: about 6 hours ago - Stars: 0 - Forks: 0

NanoNets/docstrange

Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with intelligent structured data extraction and advanced OCR.

Language: Python - Size: 347 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 493 - Forks: 37

py-pdf/pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

Language: Python - Size: 22.7 MB - Last synced at: 6 days ago - Pushed at: 19 days ago - Stars: 9,374 - Forks: 1,493

oidlabs-com/Lexoid

Multimodal document parser for high quality data understanding and extraction

Language: Python - Size: 47 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 79 - Forks: 8

sylphxltd/pdf-reader-mcp

An MCP server built with Node.js/TypeScript that allows AI agents to securely read PDF files (local or URL) and extract text, metadata, or page counts. Uses pdf-parse.

Language: TypeScript - Size: 1.01 MB - Last synced at: 7 days ago - Pushed at: 13 days ago - Stars: 226 - Forks: 27

drmingler/smart-llm-loader

smart-llm-loader is a lightweight yet powerful Python package that transforms any document into LLM-ready chunks. Spend less time on preprocessing headaches and more time building what matters. From RAG systems to chatbots to document Q&A, SmartLLMLoader handles the heavy lifting so you can focus on creating exceptional AI applications.

Language: Python - Size: 1.09 MB - Last synced at: 6 days ago - Pushed at: 7 months ago - Stars: 71 - Forks: 2

CASParser/cas-parser-node

CAS Parser allows you to track Consolidated Account Statement (CAS PDF) portfolios from NSDL, CDSL, CAMS, KFintech - CAS Parser API Client - NPM

Language: TypeScript - Size: 271 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

ispras/dedoc

Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser

Language: Python - Size: 240 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 589 - Forks: 44

datalogics/apdfl-vb-dotnet-samples

Adobe PDF Library Samples in Visual Basic for .NET

Language: Visual Basic .NET - Size: 176 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 4

michelcrypt4d4mus/pdfalyzer

Analyze PDFs. With colors. And Yara.

Language: YARA - Size: 94.9 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 285 - Forks: 21

codereverser/casparser

Parser for Consolidated Account Statements (CAS) generated from CAMS/Karvy/Kfintech

Language: Python - Size: 7.85 MB - Last synced at: 2 days ago - Pushed at: 3 months ago - Stars: 161 - Forks: 66

NeurosynLabs/ai-prompt-splitter

Free AI Prompt Splitter - Split large documents into chunks for ChatGPT, Claude, GPT-4. Supports PDF, TXT, MD files. Smart token counting & overlap control.

Size: 50.8 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 1 - Forks: 0

Besthope-Official/predoc

Preprocess document service for RAG (Retriveal Augumented Generation)

Language: Python - Size: 122 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 1 - Forks: 1

Sourik-10/PrismAI

QuickAI is a full-stack AI web application built with a modular client–server architecture. The project is primarily developed in JavaScript, with the frontend and backend kept in separate folders for better structure and scalability. It leverages modern web technologies and integrates AI-powered features to deliver intelligent interactions.

Language: JavaScript - Size: 14.1 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

titipata/scipdf_parser

Python PDF parser for scientific publications: content and figures

Language: Python - Size: 29.2 MB - Last synced at: 8 days ago - Pushed at: over 1 year ago - Stars: 423 - Forks: 67

yobix-ai/extractous

Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

Language: Rust - Size: 2.88 MB - Last synced at: 15 days ago - Pushed at: 9 months ago - Stars: 1,217 - Forks: 56

CASParser/cas-parser-php

CAS Parser allows you to track Consolidated Account Statement (CAS PDF) portfolios from NSDL, CDSL, CAMS, KFintech - CAS Parser API Client - PHP

Language: PHP - Size: 164 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

CASParser/cas-parser-go

CAS Parser allows you to track Consolidated Account Statement (CAS PDF) portfolios from NSDL, CDSL, CAMS, KFintech - CAS Parser API Client - GO

Language: Go - Size: 148 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

kelvinleandro/ufc-ira-calculator

Aplicação com Streamlit que calcula o Índice de Rendimento Acadêmico (IRA)

Language: Python - Size: 1020 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

datalogics/apdfl-csharp-dotnet-samples

Sample code for the Datalogics .NET interface of the Adobe PDF Library

Language: C# - Size: 315 KB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 8 - Forks: 10

saviobatista/vitae

AI-powered résumé transformer: match your CV to any job and export in LaTeX PDF.

Language: TypeScript - Size: 308 KB - Last synced at: 8 days ago - Pushed at: 11 days ago - Stars: 1 - Forks: 1

dromara/yft-design

yft-design is a powerful, visually stunning online design tool built with Vue3, fabric.js, and Element Plus. 基于fabric.js的开源版【稿定设计】。一款美观且功能强大的在线设计工具,具备海报设计和图片编辑功能。适用于多种场景,如海报生成、电商产品图制作、文章长图设计、视频/公众号封面编辑等 。

Language: TypeScript - Size: 50.8 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 1,380 - Forks: 279

datalogics/apdfl-kotlin-samples

Adobe PDF Library Samples in Kotlin

Language: Kotlin - Size: 146 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 7

bytedance/Dolphin

The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

Language: Python - Size: 10.9 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 5,450 - Forks: 433

PSHACKERZ/PDFQuery-AI

PDFQuery AI is an intelligent PDF conversation companion built using Flask and Python. Upload a PDF to extract key insights, generate detailed summaries, or explore specific topics interactively. Powered by the Gemini Starter API for natural language understanding, this tool simplifies complex documents into actionable information.

Language: HTML - Size: 54.7 KB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 0 - Forks: 0

aescarias/pdfnaut

A Python library for exploring PDFs with ease.

Language: Python - Size: 717 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

BitMiracle/Docotic.Pdf.Samples

C# and VB.NET samples for Docotic.Pdf library

Language: Visual Basic .NET - Size: 53.8 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 78 - Forks: 39

seinecle/nocodefunctions-io

io for nocodefunctions: csv, txt, pdf, and xlsx so far

Language: Java - Size: 273 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

PSPDFKit/nutrient-pdf-mcp-server

A powerful Model Context Protocol server for LLM-driven PDF document analysis and exploration

Language: Python - Size: 52.7 KB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

genbs/poste-italiane-parser

A Python tool to parse PDF statements from Poste Italiane (Postepay, BancoPosta) and extract data as structured JSON.

Language: Python - Size: 20.5 KB - Last synced at: 13 days ago - Pushed at: about 1 month ago - Stars: 50 - Forks: 1

SouravUpadhyay7/Morvs_Chat_Bot

🤖 MORVS AI - An intelligent chat interface powered by Groq's LLaMA 3 model with PDF processing capabilities. Built with Next.js, React, TypeScript, and modern UI components.

Language: TypeScript - Size: 43 KB - Last synced at: 20 days ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

datalogics/apdfl-java-maven-samples

Sample code for the Datalogics Java interface of the Adobe PDF Library setup to build with Maven

Language: Java - Size: 1.2 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 5 - Forks: 12

ShantiKumariGautam/IDassure

IDAssure is a face-matching-based identity verification system that ensures secure and reliable user authentication. It’s built for seamless integration into platforms that require trust and visual identity validation.

Language: JavaScript - Size: 6.95 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 1

datalogics/apdfl-cplusplus-samples

Sample code for the Datalogics C++ interface of the Adobe PDF Library

Language: C++ - Size: 35.3 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 9 - Forks: 9

datalogics/apdfl-csharp-dotnet-framework-samples

Sample code for the Datalogics .NET Framework interface of the Adobe PDF Library

Language: C# - Size: 564 KB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 3 - Forks: 9

arman61-hub/GenCraftAI

✨ GenCraftAI — An AI-powered SaaS platform to ✍️ generate blogs, 📰 craft article titles, 🧾 review resumes, and 🖼️ create visuals — all in one creative hub.

Language: JavaScript - Size: 1.47 MB - Last synced at: 26 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

EsanaKomudi/Auto-Contract-Interpreter

Auto Contract Interpreter is a Python-Tkinter app that analyzes contract PDFs using Gemini 1.5 Flash, extracts clauses, risks, and insights, supports chat-based queries, and includes text-to-speech—ideal for legal reviewers, freelancers, students, and AI learners.

Language: Python - Size: 10.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

smrutiranjan1132001/ai-resume-screener

AI based resume screeneing solution🧠

Language: Python - Size: 194 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

Sahaj33-op/SkillWise

🎯 SkillWise is an AI-powered learning path generator that transforms your resume into a personalized 6-month roadmap — complete with curated courses, project ideas, and tech stack recommendations. Built with Gemini 1.5 Flash and Streamlit.

Language: Python - Size: 3.22 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

k16shikano/hpdft

tools to poke pdf using haskell

Language: Haskell - Size: 403 KB - Last synced at: 6 days ago - Pushed at: 7 months ago - Stars: 44 - Forks: 0

per5ect/JobFinder-Backend

Back-End for JobFinder web application

Language: Java - Size: 626 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

PeterMosmans/apdfhelper

Fix links in PDF files, rewrite links, extract text annotations, remove pages

Language: Python - Size: 112 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

rafenden/pdf-menu-extractor

Library for extracting menu items from restaurant PDF menus.

Language: JavaScript - Size: 2.45 MB - Last synced at: 11 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

dunso/pdf-parser

Convert PDF content and layout information with pdf.js

Language: JavaScript - Size: 2.18 MB - Last synced at: 22 days ago - Pushed at: almost 6 years ago - Stars: 23 - Forks: 7

code4daniel/pdf-parser-service

This is a Flask-based microservice that extracts course cutoff data from university admission PDFs using pdfplumber.

Language: Python - Size: 164 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

tarfin-labs/easy-pdf

Pdf wrapper for laravel

Language: PHP - Size: 204 KB - Last synced at: 9 days ago - Pushed at: 6 months ago - Stars: 17 - Forks: 3

qwaszxerdfcv12344/SkillWise

SkillWise helps you create a tailored learning path based on your resume. Discover free courses, project ideas, and a career plan to boost your skills. 🛠️👨💻

Language: Python - Size: 2.14 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

syedaliwaqar12/Resume-Parser

🚀 A beautiful, production-ready web app that extracts structured data from PDF resumes using AI and NLP. Built with React + TypeScript + FastAPI.

Language: Python - Size: 53.7 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

siddharthparakh1105/invoice-Scanner

A OCR based python application that uses gemini api key and extract information from invoice which are in the form of pdf and then extract them to excel file

Language: Python - Size: 20.5 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

J-sephB-lt-n/pdf-bank-statement-parser

Tool for converting First National Bank (FNB) bank statement PDFs into useful structured data

Language: Python - Size: 65.4 KB - Last synced at: 7 days ago - Pushed at: 10 months ago - Stars: 4 - Forks: 3

s2bd/bracu-cgpa-calculator

CGPA calculator for BRAC University, supporting PDF uploading and real-time GPA auto-calculation.

Language: JavaScript - Size: 13.7 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

vinayaksandilya/NoteBook-Front-End

Turn any PDF into a structured online course with modules, summaries, and key takeaways — powered by Node.js, MySQL, and AI models like GPT-4 & Claude.

Language: TypeScript - Size: 126 KB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

vinayaksandilya/NoteBook-Backend

Turn any PDF into a structured online course with modules, summaries, and key takeaways — powered by Node.js, MySQL, and AI models like GPT-4 & Claude.

Language: JavaScript - Size: 66.4 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

SimpleApp/PDFParser

Swift PDFParser for PDF parsing and text mining. Includes a TrueType font parser

Language: Swift - Size: 146 KB - Last synced at: about 2 months ago - Pushed at: about 6 years ago - Stars: 42 - Forks: 11

sarabjit1003/resume-tracker

A smart resume screening tool that matches resumes to job descriptions using Streamlit and Python.

Language: Python - Size: 2.98 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

kurtnettle/bubt-routinepy

An unofficial Python wrapper of the BUBT Routine API + a robust web scraper and PDF extractor for getting routine data.

Language: Python - Size: 138 KB - Last synced at: 21 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

dewanmukto/bracu-cgpa-calculator

CGPA calculator for BRAC University, supporting PDF uploading and real-time GPA auto-calculation.

Language: JavaScript - Size: 18.6 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

code-418-dpr/SportHub-parser

Парсер PDF-файла ЕКП Минспорта РФ для проекта SportHub

Language: Python - Size: 4.1 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

FayazK/Document-Metadata-Extractor

A Python tool that uses Google's Gemini AI to automatically extract structured metadata from PDF and DOCX documents, saving results to Excel for easy analysis and organizing raw responses as JSON files.

Language: Python - Size: 11.7 KB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

lazyFrogLOL/llmdocparser

A package for parsing PDFs and analyzing their content using LLMs.

Language: Python - Size: 1.21 MB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 271 - Forks: 9

Polyte/OMS_OCR

This is an image/pdf OCR reader. Use it to extract text from either and image or PDF file, this project uses Tesseractjs & PDF-Parser to do OCR.

Language: TypeScript - Size: 69.9 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

ridi/content-parser

Content data parser for Ridibooks services

Language: JavaScript - Size: 49.2 MB - Last synced at: 15 days ago - Pushed at: about 2 years ago - Stars: 25 - Forks: 7

adrienjoly/HsbcStatementParser

Transforms PDF bank statements from HSBC into a list of operations in JSON or TSV format.

Language: JavaScript - Size: 21.5 KB - Last synced at: about 2 months ago - Pushed at: over 9 years ago - Stars: 18 - Forks: 7

aidayang/MinerU-OneClick

MinerU免安装部署一键启动整合包

Size: 49.8 KB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 10 - Forks: 2

adithya-s-k/marker-api

Easily deployable 🚀 API to convert PDF to markdown quickly with high accuracy.

Language: Python - Size: 35 MB - Last synced at: 4 months ago - Pushed at: 11 months ago - Stars: 854 - Forks: 96

drmingler/docling-api

Easily deployable and scalable backend server that efficiently converts various document formats (pdf, docx, pptx, html, images, etc) into Markdown. With support for both CPU and GPU processing, it is Ideal for large-scale workflows, it offers text/table extraction, OCR, and batch processing with sync/async endpoints.

Language: Python - Size: 3.48 MB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 585 - Forks: 58

dadicharan/Log-Analyzer

Log Analyzer with AI is a Streamlit-based tool for AI-powered log analysis. It supports CSV log uploads, data visualization (Plotly & Matplotlib), and anomaly detection using DeepSeek LLM via Ollama API. Users can explore logs, detect patterns, and gain AI-driven insights. 🚀 Python, Pandas, Streamlit, AI

Language: Python - Size: 13.7 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

diegoabeltran16/OpenPages-pipeline

Open-source tool for turning technical documents into AI-ready formats. Built for better access to knowledge.

Language: Python - Size: 1.78 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Stravah/eosin

Custom Bank Statement Parsing based on pure text positioning.

Language: Python - Size: 5.22 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 3 - Forks: 1

chinmaymisra/personal-finance-tracker

Upload Axis Bank statements as PDFs, automatically parse transactions, and view them cleanly in a modern UI. Handles invalid files and non-supported banks gracefully. Built using React (Vite) and FastAPI.

Language: Python - Size: 143 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

nlitsme/pyPdfCrack

Investigation in PDF encryption

Language: Python - Size: 34.2 KB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 17 - Forks: 7

ev0clu/pdf-ai-saas

Full stack (Next.js) PDF AI SaaS App

Language: TypeScript - Size: 830 KB - Last synced at: 28 days ago - Pushed at: 9 months ago - Stars: 4 - Forks: 2

VishwaGauravIn/pdf-parser-client-side

A lightweight easy to use package to parse text from PDF files on client side without any server dependency.

Language: TypeScript - Size: 26.4 KB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 12 - Forks: 0

sankeer28/PDF-Searcher

Live website to parse multiple PDFs using PDF.js to find matching text

Language: JavaScript - Size: 29.3 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

luccaHirae/invoice-extract-server

API para extração de dados de faturas

Language: TypeScript - Size: 75.2 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

ashutoshvarma/pyxpdf

Fast and memory-efficient Python PDF Parser based on xpdf sources

Language: Cython - Size: 12.2 MB - Last synced at: 22 days ago - Pushed at: over 1 year ago - Stars: 42 - Forks: 17

aleff-github/PDF-Parser-VirusTotal-Based 📦

PDF Parser based on VirusTotal API

Language: Python - Size: 709 KB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 0

eli64s/pdflex

CLI for merging PDF contexts.

Language: Python - Size: 465 KB - Last synced at: 19 days ago - Pushed at: 6 months ago - Stars: 3 - Forks: 0

cuiyuheng/docling Fork of docling-project/docling

🥚 Transform PDF to JSON or Markdown with ease and speed 🐣

Size: 28.5 MB - Last synced at: 6 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

cuiyuheng/olmocr Fork of allenai/olmocr

Toolkit for linearizing PDFs for LLM datasets/training

Size: 30.9 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

minjun0219/welstory-menu-pdf-parser 📦

웰스토리 메뉴 PDF Parser

Language: TypeScript - Size: 130 KB - Last synced at: 1 day ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 2

colin-tso/HSBC-AU-Statement-Parser

Parses PDF bank statements from HSBC Australia into MS Excel

Language: JavaScript - Size: 46.9 KB - Last synced at: 2 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 1

dills122/cardboard-crack

Web app for parsing/viewing Soccer Card Checklists

Language: JavaScript - Size: 1.3 MB - Last synced at: 6 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

judaicalink/rdf_generator

A library to generate rdf files in turtle format for Judaicalink.

Language: Python - Size: 27.3 KB - Last synced at: 20 days ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

ishaangupta-YB/nextjs-pdf-parser

Next.js template for seamless PDF parsing using pdf2json and custom drag nd drop file-uploader. Ideal for developers seeking a ready-to-use solution for PDF content extraction in their Next.js projects.

Language: TypeScript - Size: 200 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 4 - Forks: 3

AlphaTok-Singapore/PDFMathTranslate Fork of Byaidu/PDFMathTranslate

PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/Docker

Size: 51.4 MB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

sypht-team/sypht-java-client

A Java client for the Sypht API

Language: Java - Size: 108 KB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 87 - Forks: 1

sypht-team/sypht-python-client

A python client for the Sypht API

Language: Python - Size: 165 KB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 162 - Forks: 5

RiccardoSenica/pdf-text-parsing

PDF-parsing demo

Language: TypeScript - Size: 167 KB - Last synced at: 6 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Alapipapi/MinerU Fork of opendatalab/MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

Language: Python - Size: 103 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Daniel-Alvarenga/Boot Fork of VitorCarvalho67/Boot

Digital platform tailored for the educational environment, designed to facilitate the dissemination of internship opportunities and promote student engagement

Language: Vue - Size: 8.16 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 8 - Forks: 0