Topic: "pdf-extractor"
torakiki/pdfsam
PDFsam, a desktop application to split, merge, mix, rotate PDF files and extract pages
Language: Java - Size: 14.8 MB - Last synced at: 1 day ago - Pushed at: 3 days ago - Stars: 3,780 - Forks: 360

UglyToad/PdfPig
Read and extract text and other content from PDFs in C# (port of PDFBox)
Language: C# - Size: 167 MB - Last synced at: 5 days ago - Pushed at: 20 days ago - Stars: 2,004 - Forks: 258

DocumindHQ/documind
Open-source platform for extracting structured data from documents using AI.
Language: JavaScript - Size: 960 KB - Last synced at: 24 days ago - Pushed at: 3 months ago - Stars: 1,295 - Forks: 45

GowenGit/docnet
DocNET is as fast PDF editing and reading library for modern .NET applications
Language: C# - Size: 166 MB - Last synced at: 25 days ago - Pushed at: about 1 year ago - Stars: 496 - Forks: 88

pdftables/python-pdftables-api
Python library to interact with https://pdftables.com API
Language: Python - Size: 42 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 76 - Forks: 30

autokent/pdf-parse
Pure javascript cross-platform module to extract texts from PDFs.
Last synced at: 12 days ago - Stars: 66 - Forks: 53

Siltaar/doc_crawler.py
Explore a website recursively and download all the wanted documents (PDF, ODT…)
Size: 45.9 KB - Last synced at: 30 days ago - Pushed at: almost 4 years ago - Stars: 20 - Forks: 6

Madgrades/madgrades-extractor
UW-Madison course and grade distribution data extraction tool.
Language: Java - Size: 865 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 14 - Forks: 4

asepmaulanaismail/pdf-to-txt-python
Simple pdf to text with python using PDFtk and PyPDF2
Language: Python - Size: 550 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 13 - Forks: 9

deep-diver/neurips2024
Read and Listen to NeurIPS 2024 Papers
Language: HTML - Size: 3.46 GB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 12 - Forks: 0

codad5/pdfz
Your Rust PDF Document Text Extractor
Language: Rust - Size: 116 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 11 - Forks: 1

bytescout/pdf-extractor-sdk-samples
ByteScout PDF Extractor SDK source code samples
Language: C# - Size: 27.5 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 8 - Forks: 5

hrbrmstr/fish-stocking-pdf-data-wrangling
🐠A fishy example of how to do PDF data wrangling in R
Language: R - Size: 1.81 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 7 - Forks: 0

pdftables/go-pdftables-api
Go example of using the PDFTables.com API
Language: Go - Size: 20.5 KB - Last synced at: 11 months ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 1

renan-siqueira/python-pdf-tool
This project facilitates the extraction of text from PDF files using various Python libraries. It is designed to be flexible, allowing the choice among different text extraction libraries and supporting both single PDF file and directory containing multiple PDF files.
Language: Python - Size: 7.81 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 1

bkawan/pdf-parser
Language: Python - Size: 3.25 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 0

meitinger/PdfKit
Combines, converts, extracts and views PDFs.
Language: C# - Size: 779 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 0

eli64s/pdflex
CLI for merging PDF contexts.
Language: Python - Size: 465 KB - Last synced at: 3 days ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 0

arjun-mavonic/scanned-pdf-text-extractor
This is a Python application that converts non-readable PDF files, such as scanned documents, into readable Word documents. It achieves this by first converting the PDF files into images and then extracting the text from the images to create the Word documents. The application provides a user-friendly interface to do the above task.
Language: Python - Size: 28.3 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 3 - Forks: 2

yixegamujopa/PDF-EXPLOIT
http://t.me/ALIENDOT
Size: 0 Bytes - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

homfarnam/pdf-to-image-telegram-bot
Pdf to Image Converter - A simple tool to convert pdf to image in Telegram
Language: JavaScript - Size: 106 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 1

gimpscape/gimpscape-ppa
Gimpscape Repository for Debian Based Distributions
Language: Shell - Size: 173 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 3 - Forks: 2

skitsanos/extract-pdf-tables
PDF Tables extraction with Java and Tabula
Language: Java - Size: 25.4 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

DrMcCoy/pdftextorizer
Interactively extract text from multi-column PDFs
Language: Python - Size: 178 KB - Last synced at: about 2 months ago - Pushed at: 10 months ago - Stars: 2 - Forks: 0

dmywuzegi/PDF-EXPLOIT
http://t.me/ALIENDOT
Size: 0 Bytes - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

fmotifuziqi/PDF-EXPLOIT
http://t.me/ALIENDOT
Size: 0 Bytes - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

heshiming/paddlefish Fork of os-climate/crrf-det
A Python + C implementation for image-based PDF page layout analysis and content extraction.
Language: C++ - Size: 5.26 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

serkodev/camelot-docker
Docker setup of Camelot: PDF Table Extraction
Language: Dockerfile - Size: 1.95 KB - Last synced at: 7 days ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 1

jaffreyjoy/ez-extract
A "GRE words" dataset generation pipeline
Language: Python - Size: 2.21 MB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 0

GuilhermeStracini/POC-dotnet-ExtractPdfContent
🔬 Proof of Concept of extracting content from PDF files using multiple PDF libraries
Language: C# - Size: 201 KB - Last synced at: 8 days ago - Pushed at: 9 days ago - Stars: 1 - Forks: 0

odhyp/Automail 📦
A Python project to automate various tasks related to government official letters
Language: Python - Size: 14.6 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

nsourlos/bird_detector_ancient_manuscripts
Language: Python - Size: 17.4 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

BossaMuffin/API-PDFdataExtractionAndStorage
[2023-01] A python Flask API to extrat metadata and text from PDF files. Asynchronous tasks executed with a Celery queue and Redis workers. A SQLite storage managed by SqlAlchemy. Clean code with Flake8 and Isort. Coverage tested with Pytest-cov. See the documentation in the Readme.md and check the API contract with Swagger.
Language: Python - Size: 7.83 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

Hymian7/PDFtkSharp
C# Wrapper around PDFLabs PDFtk Server CLI
Language: C# - Size: 3.84 MB - Last synced at: 2 days ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 2

bytescout/pdfco-rails
PDF.co Gem plugin for Ruby on Rails
Language: Ruby - Size: 13.7 KB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

NextSecurity/ioc_parser Fork of armbues/ioc_parser
Tool to extract indicators of compromise from security reports in PDF format
Size: 45.9 KB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 0

javaidb/personal-finance-tracker
Personal finance tracker via interpretation of bank statements from Scotiabank. Insights into spending habits, trends and long-term growth.
Language: Jupyter Notebook - Size: 420 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

balgariya/listractor
PDF екстрактор за листовки
Language: TypeScript - Size: 6.37 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 1

douglasdcc/TKinter-PDF-Extractor
TKinter PDF extractor
Language: Python - Size: 609 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

sfkbstnc/pdf-extractor-cli
A professional, modular, and open-source Python command-line tool to extract data from PDFs — including plain text, tables, images, and OCR content — using best-in-class libraries like PyMuPDF, pdfplumber, and pytesseract.
Language: Python - Size: 2.24 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

sensein/GrobidArticleExtractor
Language: CSS - Size: 2.27 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 0 - Forks: 1

unfairlaw/Extrator-de-tabelas
Ferramenta voltada a extrair tabelas de PDFs
Language: Python - Size: 3.91 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

HermesRoot/doceru-pdf-extractor
Extensão leve e prática para extrair e baixar PDFs do Doceru.com com um clique!
Language: JavaScript - Size: 36.1 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

patrickiel/PDF-Image-Extractor
A Python tool to extract images from PDF files with filtering and organization.
Language: Python - Size: 0 Bytes - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

xiaoyao9184/docker-marker
Docker implementation of the Marker pdf to markdown
Language: Python - Size: 53.7 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

peterdey/pdftotext-dll Fork of insinfo/xpdf
PDF text extractor DLL for VB6
Language: C - Size: 223 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 1

H-Software224/khuthon_2024
Let's go khuthon in 2024!
Language: Jupyter Notebook - Size: 116 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

xiaoyao9184/docker-magic
Docker implementation of the MinerU pdf to markdown
Size: 12.7 KB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

CllsPy/PyPTE
The PDF Text Extractor API allows users to upload PDF files and receive the extracted text from those files. This API is built using FastAPI and leverages the PyMuPDF library for efficient text extraction.
Language: Python - Size: 11.7 KB - Last synced at: 2 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Jemeni11/pdfjs
Testing the capabilities of pdfjs
Language: TypeScript - Size: 139 KB - Last synced at: 6 days ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Jemeni11/reactpdf
Testing the capabilities of reactpdf
Language: TypeScript - Size: 224 KB - Last synced at: 6 days ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Eemayas/Data-Extraction-PDFs
This project provides a set of tools for extracting data from PDF files, visualizing text locations, and comparing the extracted data with ground truth data stored in CSV files. It calculates errors using Mean Absolute Error (MAE) and provides accuracy metrics for different fields.
Language: Jupyter Notebook - Size: 1.85 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

merrvve/pdf-image-extract
Command-line tool to extract and save images (JPEG, PNG) from a PDF file or all PDFs in a directory based on the specific byte signatures.
Language: Python - Size: 4.14 MB - Last synced at: 2 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

DerartuDagne/The-Complete-LangChain-LLMs-Guide Fork of PacktPublishing/The-Complete-LangChain-LLMs-Guide
This repository, forked from Packt Publishing, serves as a comprehensive guide to LangChain and LLMs, encompassing all the resources and knowledge gained from the on-demand course.
Language: Python - Size: 2.43 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

psilvautomata/Automated_PDF_Data_Processing
Data automation and processing tool designed to streamline the extraction and analysis of data from PDF's documents using MS Power Automate Desktop and Excel VBA.
Language: VBA - Size: 22.5 MB - Last synced at: about 2 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

kkew3/muconvert_rust
A thin C and Rust wrappers over `mutool convert` that extract text from pdf into in-memory buffer.
Language: C - Size: 15.6 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

GeroZayas/PDF-itemslist-extractor
Efficient tool for PDF lists items extraction to CSV conversion and CSV file merging, leveraging Python's powerful libraries.
Language: Python - Size: 265 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

ErykDarnowski/ts-test-extractor
Simple script for extracting questions, answers and so on from test PDFs (for a subject called TS I have at uni) to a more usable format.
Language: Python - Size: 44.9 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

PeterMosmans/apdfhelper
Fix links in PDF files, rewrite links, extract text annotations, remove pages
Language: Python - Size: 98.6 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Maclenn77/pdf-explainer
An Intelligent Assistant that explains the content of a PDF file. Built with ChromaDB and Langchain.
Language: Python - Size: 248 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

RichardScottOZ/geoscience_language_models Fork of NRCan/geoscience_language_models
GloVe and BERT language models re-trained using geological text.
Language: Jupyter Notebook - Size: 16.2 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

nf-n-commercial/asq-quest-extractor
CLT to automate scoring of ASQ form workflow
Language: Python - Size: 2.93 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 1

amit2014/PDF-Extractor
PDF Extractor, a powerful Python application that simplifies the extraction of highlighted text from PDF files.
Language: HTML - Size: 26.1 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

pauloofmeta/fgts-revisor
Api to calculate the FGTS revision
Language: TypeScript - Size: 9.77 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

saiedislamshuvo/pdf-splitter-tool-react
This is a simple ReactJS project that allows you to split a PDF file into separate pages, each page with a given name.
Language: CSS - Size: 422 KB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

paritoshtripathi935/Regex-PDF-Extractor
Regex-PDF-Extractor
Language: Python - Size: 41 KB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

ivaquero/pdfriend 📦
A Cross-Platform PySide6-based GUI for PyPDF (🚧 WIP)
Language: Python - Size: 11.7 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

blminami/node-js-scripts
Random scripts
Language: TypeScript - Size: 14.6 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

ktxo/pdf-extractor-demo
POC - Data extraction from PDFs invoices
Size: 369 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

kevalane/10k-extractor
Extract numbers from 10k pdf. No longer worked on bc SEC API exists.
Language: JavaScript - Size: 921 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

Aslan934/pdf_extractor
Asynchronous pdf extractor api
Language: Python - Size: 11.5 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

huda-lab/texture
A framework for data extraction over print documents that allows to construct data extraction rules over an inferred document structure.
Size: 10.9 MB - Last synced at: 11 months ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

deyvisonguilherme/extract_text
Extrator de texto de arquivos PDF
Language: C# - Size: 3.5 MB - Last synced at: about 2 years ago - Pushed at: almost 8 years ago - Stars: 0 - Forks: 0
