An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: pdf-parser

FlazeFy/Gudangku-Laravel

GudangKu helps you manage your belongings, from home supplies and food stock to furniture. Set reminders to remind you to cleaning or maybe time to restocking some of your home supplies. In this apps also have generate reports to create shopping or maintenance list. Start organizing your inventory with GudangKu’s features. Created using Laravel

Language: PHP - Size: 1.31 MB - Last synced at: about 2 hours ago - Pushed at: about 3 hours ago - Stars: 3 - Forks: 0

Aumlo123/pdfdoom

DOOM in a PDF (as ascii art)

Size: 1000 Bytes - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 0

iamarunbrahma/vision-parse

Parse PDFs into markdown using Vision LLMs

Language: Python - Size: 374 KB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 361 - Forks: 50

dromara/yft-design

基于fabric.js的开源版【稿定设计】。一款美观且功能强大的在线设计工具,具备海报设计和图片编辑功能。适用于多种场景,如海报生成、电商产品图制作、文章长图设计、视频/公众号封面编辑等 。A beautiful and powerful online design tool

Language: TypeScript - Size: 50.8 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1,256 - Forks: 251

Besthope-Official/predoc

Preprocess document service for RAG (Retriveal Augumented Generation)

Language: Python - Size: 23.4 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 2

Stranger123444/u

An interactive command-line tool designed to quickly navigate directories and perform various file operations efficiently. Its simple syntax and intuitive commands make it a favorite among developers for streamlining workflow tasks.

Size: 1000 Bytes - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

opendatalab/MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

Language: Python - Size: 124 MB - Last synced at: 4 days ago - Pushed at: 10 days ago - Stars: 32,851 - Forks: 2,616

py-pdf/pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

Language: Python - Size: 21 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 9,012 - Forks: 1,461

oidlabs-com/Lexoid

Multimodal document parser for high quality data understanding and extraction

Language: Python - Size: 46.7 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 44 - Forks: 6

Stravah/eosin

Custom Bank Statement Parsing based on pure text positioning.

Language: Python - Size: 5.22 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 3 - Forks: 1

aescarias/pdfnaut

A Python library for exploring PDFs with ease.

Language: Python - Size: 773 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

diegoabeltran16/OpenPages-pipeline

Open-source tool for turning technical documents into AI-ready formats. Built for better access to knowledge.

Language: Python - Size: 1.78 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

sylphxltd/pdf-reader-mcp

An MCP server built with Node.js/TypeScript that allows AI agents to securely read PDF files (local or URL) and extract text, metadata, or page counts. Uses pdf-parse.

Language: TypeScript - Size: 474 KB - Last synced at: 7 days ago - Pushed at: 12 days ago - Stars: 15 - Forks: 2

chinmaymisra/personal-finance-tracker

Upload Axis Bank statements as PDFs, automatically parse transactions, and view them cleanly in a modern UI. Handles invalid files and non-supported banks gracefully. Built using React (Vite) and FastAPI.

Language: Python - Size: 143 KB - Last synced at: 9 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

drmingler/smart-llm-loader

smart-llm-loader is a lightweight yet powerful Python package that transforms any document into LLM-ready chunks. Spend less time on preprocessing headaches and more time building what matters. From RAG systems to chatbots to document Q&A, SmartLLMLoader handles the heavy lifting so you can focus on creating exceptional AI applications.

Language: Python - Size: 1.09 MB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 65 - Forks: 2

code-418-dpr/SportHub-parser

Парсер PDF-файла ЕКП Минспорта РФ для проекта SportHub

Language: Python - Size: 2.26 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

ispras/dedoc

Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser

Language: Python - Size: 235 MB - Last synced at: 21 days ago - Pushed at: 22 days ago - Stars: 233 - Forks: 27

liweiphys/layra

LAYRA is a ready-to-use visual RAG system with a complete web UI built with Next.js and FastAPI, preserving document layout, tables, paragraphs, and graphical elements without any structural fragmentation.

Language: TypeScript - Size: 2.61 MB - Last synced at: 22 days ago - Pushed at: 23 days ago - Stars: 427 - Forks: 42

lazyFrogLOL/llmdocparser

A package for parsing PDFs and analyzing their content using LLMs.

Language: Python - Size: 1.21 MB - Last synced at: 9 days ago - Pushed at: 9 months ago - Stars: 269 - Forks: 9

sankeer28/PDF-Searcher

Live website to parse multiple PDFs using PDF.js to find matching text

Language: JavaScript - Size: 29.3 KB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

drmingler/docling-api

Easily deployable and scalable backend server that efficiently converts various document formats (pdf, docx, pptx, html, images, etc) into Markdown. With support for both CPU and GPU processing, it is Ideal for large-scale workflows, it offers text/table extraction, OCR, and batch processing with sync/async endpoints.

Language: Python - Size: 3.48 MB - Last synced at: 21 days ago - Pushed at: 2 months ago - Stars: 502 - Forks: 54

datalogics/apdfl-cplusplus-samples

Sample code for the Datalogics C++ interface of the Adobe PDF Library

Language: C++ - Size: 11.1 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 8 - Forks: 7

datalogics/apdfl-csharp-dotnet-samples

Sample code for the Datalogics .NET interface of the Adobe PDF Library

Language: C# - Size: 298 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 8 - Forks: 9

datalogics/apdfl-csharp-dotnet-framework-samples

Sample code for the Datalogics .NET Framework interface of the Adobe PDF Library

Language: C# - Size: 562 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 3 - Forks: 9

datalogics/apdfl-java-maven-samples

Sample code for the Datalogics Java interface of the Adobe PDF Library setup to build with Maven

Language: Java - Size: 1.16 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 4 - Forks: 11

titipata/scipdf_parser

Python PDF parser for scientific publications: content and figures

Language: Python - Size: 29.2 MB - Last synced at: 23 days ago - Pushed at: about 1 year ago - Stars: 402 - Forks: 64

michelcrypt4d4mus/pdfalyzer

Analyze PDFs. With colors. And Yara.

Language: Python - Size: 93.5 MB - Last synced at: 23 days ago - Pushed at: 5 months ago - Stars: 260 - Forks: 19

yobix-ai/extractous

Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

Language: Rust - Size: 2.88 MB - Last synced at: 28 days ago - Pushed at: 5 months ago - Stars: 1,051 - Forks: 43

adithya-s-k/marker-api

Easily deployable 🚀 API to convert PDF to markdown quickly with high accuracy.

Language: Python - Size: 35 MB - Last synced at: 26 days ago - Pushed at: 7 months ago - Stars: 833 - Forks: 92

BitMiracle/Docotic.Pdf.Samples

C# and VB.NET samples for Docotic.Pdf library

Language: Visual Basic .NET - Size: 53.5 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 78 - Forks: 39

luccaHirae/invoice-extract-server

API para extração de dados de faturas

Language: TypeScript - Size: 75.2 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

VishwaGauravIn/pdf-parser-client-side

A lightweight easy to use package to parse text from PDF files on client side without any server dependency.

Language: TypeScript - Size: 26.4 KB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 12 - Forks: 0

ashutoshvarma/pyxpdf

Fast and memory-efficient Python PDF Parser based on xpdf sources

Language: Cython - Size: 12.2 MB - Last synced at: 21 days ago - Pushed at: over 1 year ago - Stars: 42 - Forks: 17

aleff-github/PDF-Parser-VirusTotal-Based 📦

PDF Parser based on VirusTotal API

Language: Python - Size: 709 KB - Last synced at: 11 days ago - Pushed at: about 2 years ago - Stars: 4 - Forks: 0

eli64s/pdflex

CLI for merging PDF contexts.

Language: Python - Size: 465 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 0

FayazK/Document-Metadata-Extractor

A Python tool that uses Google's Gemini AI to automatically extract structured metadata from PDF and DOCX documents, saving results to Excel for easy analysis and organizing raw responses as JSON files.

Language: Python - Size: 11.7 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

codereverser/casparser

Parser for Consolidated Account Statements (CAS) generated from CAMS/Karvy/Kfintech

Language: Python - Size: 7.85 MB - Last synced at: 11 days ago - Pushed at: 2 months ago - Stars: 142 - Forks: 66

aidayang/MinerU-OneClick

MinerU免安装部署一键启动整合包

Size: 49.8 KB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 7 - Forks: 0

ridi/content-parser

Content data parser for Ridibooks services

Language: JavaScript - Size: 49.2 MB - Last synced at: 20 days ago - Pushed at: almost 2 years ago - Stars: 23 - Forks: 7

cuiyuheng/docling Fork of docling-project/docling

🥚 Transform PDF to JSON or Markdown with ease and speed 🐣

Size: 28.5 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

datalogics/apdfl-vb-dotnet-samples

Adobe PDF Library Samples in Visual Basic for .NET

Language: Visual Basic .NET - Size: 174 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 4

cuiyuheng/olmocr Fork of allenai/olmocr

Toolkit for linearizing PDFs for LLM datasets/training

Size: 30.9 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

datalogics/apdfl-kotlin-samples

Adobe PDF Library Samples in Kotlin

Language: Kotlin - Size: 135 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 6

minjunk/welstory-menu-pdf-parser 📦

웰스토리 메뉴 PDF Parser

Language: TypeScript - Size: 130 KB - Last synced at: 4 days ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 2

k16shikano/hpdft

tools to poke pdf using haskell

Language: Haskell - Size: 403 KB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 43 - Forks: 0

tarfin-labs/easy-pdf

Pdf wrapper for laravel

Language: PHP - Size: 204 KB - Last synced at: 22 days ago - Pushed at: 2 months ago - Stars: 17 - Forks: 3

seinecle/nocodefunctions-io

io for nocodefunctions: csv, txt, pdf, and xlsx so far

Language: Java - Size: 174 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

dills122/cardboard-crack

Web app for parsing/viewing Soccer Card Checklists

Language: JavaScript - Size: 1.3 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

judaicalink/rdf_generator

A library to generate rdf files in turtle format for Judaicalink.

Language: Python - Size: 27.3 KB - Last synced at: 5 days ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

ishaangupta-YB/nextjs-pdf-parser

Next.js template for seamless PDF parsing using pdf2json and custom drag nd drop file-uploader. Ideal for developers seeking a ready-to-use solution for PDF content extraction in their Next.js projects.

Language: TypeScript - Size: 200 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 4 - Forks: 3

sypht-team/sypht-java-client

A Java client for the Sypht API

Language: Java - Size: 108 KB - Last synced at: 29 days ago - Pushed at: almost 4 years ago - Stars: 87 - Forks: 1

sypht-team/sypht-python-client

A python client for the Sypht API

Language: Python - Size: 165 KB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 162 - Forks: 5

RiccardoSenica/pdf-text-parsing

PDF-parsing demo

Language: TypeScript - Size: 167 KB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Alapipapi/MinerU Fork of opendatalab/MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

Language: Python - Size: 103 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Daniel-Alvarenga/Boot Fork of VitorCarvalho67/Boot

Digital platform tailored for the educational environment, designed to facilitate the dissemination of internship opportunities and promote student engagement

Language: Vue - Size: 8.16 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 8 - Forks: 0

SimpleApp/PDFParser

Swift PDFParser for PDF parsing and text mining. Includes a TrueType font parser

Language: Swift - Size: 146 KB - Last synced at: 5 months ago - Pushed at: almost 6 years ago - Stars: 37 - Forks: 10

adrienjoly/HsbcStatementParser

Transforms PDF bank statements from HSBC into a list of operations in JSON or TSV format.

Language: JavaScript - Size: 21.5 KB - Last synced at: 16 days ago - Pushed at: over 9 years ago - Stars: 17 - Forks: 6

J-sephB-lt-n/pdf-bank-statement-parser

Tool for converting First National Bank (FNB) bank statement PDFs into useful structured data

Language: Python - Size: 65.4 KB - Last synced at: 8 days ago - Pushed at: 6 months ago - Stars: 1 - Forks: 1

easonlai/chat_with_pdf_table

The contents of this repository showcase how to extract table data from a PDF file and preprocess it to facilitate word embedding. This preprocessing step enhances the readability of table data for language models and enables us to extract more contextual information from the tables.

Language: Jupyter Notebook - Size: 85.9 KB - Last synced at: 14 days ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 4

bansalsahab/Parser

pdf heading parser

Language: Python - Size: 12.7 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

clarekang/form-pdf2json

NodeJS library to convert JSON to PDF or vice versa

Language: JavaScript - Size: 2.67 MB - Last synced at: 8 days ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 2

ashot-israelyan/nextjs-pdf-openai-chat

A demo AI application for uploading PDF files and chatting withChatGPT regarding the content

Language: TypeScript - Size: 2.42 MB - Last synced at: 6 months ago - Pushed at: 11 months ago - Stars: 2 - Forks: 0

aqiftekhar/OpenAIChatBot

This is a healthcare Chatbot implemented using Open AI that also recieve PDF Documents and Images and prescribe based on summary

Language: TypeScript - Size: 73.2 KB - Last synced at: 2 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

dunso/pdf-parser

Convert PDF content and layout information with pdf.js

Language: JavaScript - Size: 2.18 MB - Last synced at: 9 days ago - Pushed at: over 5 years ago - Stars: 21 - Forks: 7

yintellect/auto-law-review

Automate the case review on legal case documents.

Language: Jupyter Notebook - Size: 30.5 MB - Last synced at: 4 months ago - Pushed at: about 4 years ago - Stars: 11 - Forks: 3

race-tech/f1-data-updater

A repository made to update automatically the f1 database used in the f1-api.

Language: Rust - Size: 194 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

lucasjvds/Scanipy

Scanipy stands for "scan it with Python"—it's your smart Python library for scanning and parsing complex PDF files like books, reports, articles, and academic papers. Utilizing cutting-edge Deep Learning algorithms, Scanipy transforms your PDFs into a treasure trove of extractable information: tables, images, equations, and text.

Language: Python - Size: 273 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 19 - Forks: 1

davendw49/sciparser

PDF parsing toolkit for preparing academic text corpus

Language: Python - Size: 113 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 45 - Forks: 2

yvnggodemis/pdf-parse

PDF Parser built in Rust

Language: Rust - Size: 146 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

antea-p/flashcard_maker

Flashcard maker written in TypeScript, utilizing OpenAI API to create great cloze flashcards.

Language: TypeScript - Size: 25.4 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

greatjourney589/rogu-platform

React&Firebase platform for Ecommerce&Game

Language: JavaScript - Size: 176 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

patrixshah/ResumeScreening

Resume Screening: An AI Driven User Profile Screening Tool

Language: TypeScript - Size: 340 KB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

ashutoshvarma/libxpdf

Static library built from source of www.xpdfreader.com with most of dependencies built within

Language: C++ - Size: 613 KB - Last synced at: 21 days ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 4

Kanchii/avenue-brokerage-to-excel

A simple script to convert Avenue's brokerage statements to excel, extracting some data

Language: Python - Size: 11.7 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

nlitsme/pyPdfCrack

Investigation in PDF encryption

Language: Python - Size: 34.2 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 16 - Forks: 7

lesterchan/linkedin-pdf-resume-parser

Parse LinkedIn PDF Resume and extract out name, email, education and work experiences.

Language: PHP - Size: 289 KB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 25 - Forks: 11

devleejb/pdf-parser

PDF to JSON in my computer!

Language: JavaScript - Size: 1000 Bytes - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

mehmet-kozan/pdf-parse

Pure javascript cross-platform module to extract texts from PDFs.

Language: JavaScript - Size: 7.78 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

shawakash/alphaFreq 📦

Assignment for Probability and Random Process

Language: TypeScript - Size: 16.7 MB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Siddhantsingh1230/SnapCV

A Simple NLP Web App to create summaries of your CVs

Language: CSS - Size: 343 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

CORDEA/pdf_image_extractor

Extract images from PDF

Language: Dart - Size: 182 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

knands42/TextProcessor-Regex 📦

Explore the regex world with FluentAPI pattern

Language: TypeScript - Size: 178 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

MattLondon101/NLP-Parser

Extract form input from PDFs and group keywords into subtopics with Latent Dirichlet Allocation (LDA).

Language: Python - Size: 660 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

sypht-team/sypht-node-client

A Nodejs client for the Sypht API

Language: JavaScript - Size: 62.5 KB - Last synced at: 26 days ago - Pushed at: about 2 years ago - Stars: 13 - Forks: 4

tuffstuff9/nextjs-pdf-parser

Next.js template for seamless PDF parsing using pdf2json and FilePond. Ideal for developers seeking a ready-to-use solution for PDF content extraction in Next.js projects.

Language: TypeScript - Size: 44.9 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 17 - Forks: 2

Siddhantsingh1230/SnapCV_Backend

A Node Backend Server for SnapCV

Language: HTML - Size: 32.2 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

PeterMosmans/apdfhelper

Fix links in PDF files, rewrite links, extract text annotations, remove pages

Language: Python - Size: 98.6 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

sypht-team/sypht-ruby-client

A Ruby client for the Sypht API

Language: Ruby - Size: 53.7 KB - Last synced at: about 2 months ago - Pushed at: over 5 years ago - Stars: 4 - Forks: 0

tomludlow2/php_nhs_payslip_parser

Uses the https://github.com/smalot/pdfparser Parser to open NHS Payslips in PHP. Then parses them to extract the relevant contents into a php assoc array

Language: PHP - Size: 11.7 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

sypht-team/sypht-elixir-client

An Elixir client for the Sypht API https://sypht.com

Language: Elixir - Size: 47.9 KB - Last synced at: 26 days ago - Pushed at: about 5 years ago - Stars: 6 - Forks: 0

datalogics/adobe-pdf-library-samples

Sample code for the Datalogics C++, Java, and .NET interfaces of the Adobe PDF Library

Size: 43.3 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 77 - Forks: 62

bratergit/hacktoberfest2020

Hacktoberfest 2020 - Faça um programa desktop que rode no terminal que dado um pdf da toro investimentos com as corretagens do dia. Mostre o Cálculo do Imposto de Renda para day trade do mini dolar e mini índice da bovespa.

Language: JavaScript - Size: 92.8 KB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 2

lulucasalves/lumi-back

Backend application test

Language: TypeScript - Size: 790 KB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

GrigorisLionis/ika-stats-parser

PDF parser of IKA work related statistics data

Language: Python - Size: 134 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

CodeTrace-MY/PDF-Text-Extraction

Algorithm to extract labels and readings from industrial engineer drawings

Language: Python - Size: 643 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

jogemu/pdf2tree

Parse PDF and group elements based on enclosing lines. A node.js module that promisifies the pdf2json parser and structures the data in a way that is suitable for tables with merged cells.

Language: JavaScript - Size: 12.7 KB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

AnyaChickenMcnuggets/PrimoRPAPdfToCsv

Size: 228 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

BuildmodeOne/canisius-parser

A pdf parser to extract the meal plan from the "Katholische Canisiusstiftung" in Ingolstadt

Language: TypeScript - Size: 634 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

cschen1205/spring-pdf-search-engine

PDF Search Engine implemented in Java and Spring Boot

Language: Java - Size: 77 MB - Last synced at: about 1 month ago - Pushed at: about 7 years ago - Stars: 3 - Forks: 5

leandroroser/prettyparser

Parallel processing and parsing PDF and TXT files, and Python objects with text (str, list) using rules (regular expressions).

Language: Python - Size: 106 KB - Last synced at: 27 days ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0