An open API service providing repository metadata for many open source software ecosystems.

Topic: "pdf-extractor"

torakiki/pdfsam

PDFsam, a desktop application to split, merge, mix, rotate PDF files and extract pages

Language: Java - Size: 14.8 MB - Last synced at: 1 day ago - Pushed at: 3 days ago - Stars: 3,780 - Forks: 360

UglyToad/PdfPig

Read and extract text and other content from PDFs in C# (port of PDFBox)

Language: C# - Size: 167 MB - Last synced at: 5 days ago - Pushed at: 20 days ago - Stars: 2,004 - Forks: 258

DocumindHQ/documind

Open-source platform for extracting structured data from documents using AI.

Language: JavaScript - Size: 960 KB - Last synced at: 24 days ago - Pushed at: 3 months ago - Stars: 1,295 - Forks: 45

GowenGit/docnet

DocNET is as fast PDF editing and reading library for modern .NET applications

Language: C# - Size: 166 MB - Last synced at: 25 days ago - Pushed at: about 1 year ago - Stars: 496 - Forks: 88

pdftables/python-pdftables-api

Python library to interact with https://pdftables.com API

Language: Python - Size: 42 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 76 - Forks: 30

autokent/pdf-parse

Pure javascript cross-platform module to extract texts from PDFs.

Last synced at: 12 days ago - Stars: 66 - Forks: 53

Siltaar/doc_crawler.py

Explore a website recursively and download all the wanted documents (PDF, ODT…)

Size: 45.9 KB - Last synced at: 30 days ago - Pushed at: almost 4 years ago - Stars: 20 - Forks: 6

Madgrades/madgrades-extractor

UW-Madison course and grade distribution data extraction tool.

Language: Java - Size: 865 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 14 - Forks: 4

asepmaulanaismail/pdf-to-txt-python

Simple pdf to text with python using PDFtk and PyPDF2

Language: Python - Size: 550 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 13 - Forks: 9

deep-diver/neurips2024

Read and Listen to NeurIPS 2024 Papers

Language: HTML - Size: 3.46 GB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 12 - Forks: 0

codad5/pdfz

Your Rust PDF Document Text Extractor

Language: Rust - Size: 116 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 11 - Forks: 1

bytescout/pdf-extractor-sdk-samples

ByteScout PDF Extractor SDK source code samples

Language: C# - Size: 27.5 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 8 - Forks: 5

hrbrmstr/fish-stocking-pdf-data-wrangling

🐠A fishy example of how to do PDF data wrangling in R

Language: R - Size: 1.81 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 7 - Forks: 0

pdftables/go-pdftables-api

Go example of using the PDFTables.com API

Language: Go - Size: 20.5 KB - Last synced at: 11 months ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 1

renan-siqueira/python-pdf-tool

This project facilitates the extraction of text from PDF files using various Python libraries. It is designed to be flexible, allowing the choice among different text extraction libraries and supporting both single PDF file and directory containing multiple PDF files.

Language: Python - Size: 7.81 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 1

bkawan/pdf-parser

Language: Python - Size: 3.25 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 0

meitinger/PdfKit

Combines, converts, extracts and views PDFs.

Language: C# - Size: 779 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 0

eli64s/pdflex

CLI for merging PDF contexts.

Language: Python - Size: 465 KB - Last synced at: 3 days ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 0

arjun-mavonic/scanned-pdf-text-extractor

This is a Python application that converts non-readable PDF files, such as scanned documents, into readable Word documents. It achieves this by first converting the PDF files into images and then extracting the text from the images to create the Word documents. The application provides a user-friendly interface to do the above task.

Language: Python - Size: 28.3 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 3 - Forks: 2

yixegamujopa/PDF-EXPLOIT

http://t.me/ALIENDOT

Size: 0 Bytes - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

homfarnam/pdf-to-image-telegram-bot

Pdf to Image Converter - A simple tool to convert pdf to image in Telegram

Language: JavaScript - Size: 106 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 1

gimpscape/gimpscape-ppa

Gimpscape Repository for Debian Based Distributions

Language: Shell - Size: 173 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 3 - Forks: 2

skitsanos/extract-pdf-tables

PDF Tables extraction with Java and Tabula

Language: Java - Size: 25.4 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

DrMcCoy/pdftextorizer

Interactively extract text from multi-column PDFs

Language: Python - Size: 178 KB - Last synced at: about 2 months ago - Pushed at: 10 months ago - Stars: 2 - Forks: 0

dmywuzegi/PDF-EXPLOIT

http://t.me/ALIENDOT

Size: 0 Bytes - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

fmotifuziqi/PDF-EXPLOIT

http://t.me/ALIENDOT

Size: 0 Bytes - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

heshiming/paddlefish Fork of os-climate/crrf-det

A Python + C implementation for image-based PDF page layout analysis and content extraction.

Language: C++ - Size: 5.26 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

serkodev/camelot-docker

Docker setup of Camelot: PDF Table Extraction

Language: Dockerfile - Size: 1.95 KB - Last synced at: 7 days ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 1

jaffreyjoy/ez-extract

A "GRE words" dataset generation pipeline

Language: Python - Size: 2.21 MB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 0

GuilhermeStracini/POC-dotnet-ExtractPdfContent

🔬 Proof of Concept of extracting content from PDF files using multiple PDF libraries

Language: C# - Size: 201 KB - Last synced at: 8 days ago - Pushed at: 9 days ago - Stars: 1 - Forks: 0

odhyp/Automail 📦

A Python project to automate various tasks related to government official letters

Language: Python - Size: 14.6 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

nsourlos/bird_detector_ancient_manuscripts

Language: Python - Size: 17.4 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

BossaMuffin/API-PDFdataExtractionAndStorage

[2023-01] A python Flask API to extrat metadata and text from PDF files. Asynchronous tasks executed with a Celery queue and Redis workers. A SQLite storage managed by SqlAlchemy. Clean code with Flake8 and Isort. Coverage tested with Pytest-cov. See the documentation in the Readme.md and check the API contract with Swagger.

Language: Python - Size: 7.83 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

Hymian7/PDFtkSharp

C# Wrapper around PDFLabs PDFtk Server CLI

Language: C# - Size: 3.84 MB - Last synced at: 2 days ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 2

bytescout/pdfco-rails

PDF.co Gem plugin for Ruby on Rails

Language: Ruby - Size: 13.7 KB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

NextSecurity/ioc_parser Fork of armbues/ioc_parser

Tool to extract indicators of compromise from security reports in PDF format

Size: 45.9 KB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 0

javaidb/personal-finance-tracker

Personal finance tracker via interpretation of bank statements from Scotiabank. Insights into spending habits, trends and long-term growth.

Language: Jupyter Notebook - Size: 420 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

balgariya/listractor

PDF екстрактор за листовки

Language: TypeScript - Size: 6.37 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 1

douglasdcc/TKinter-PDF-Extractor

TKinter PDF extractor

Language: Python - Size: 609 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

sfkbstnc/pdf-extractor-cli

A professional, modular, and open-source Python command-line tool to extract data from PDFs — including plain text, tables, images, and OCR content — using best-in-class libraries like PyMuPDF, pdfplumber, and pytesseract.

Language: Python - Size: 2.24 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

sensein/GrobidArticleExtractor

Language: CSS - Size: 2.27 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 0 - Forks: 1

unfairlaw/Extrator-de-tabelas

Ferramenta voltada a extrair tabelas de PDFs

Language: Python - Size: 3.91 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

HermesRoot/doceru-pdf-extractor

Extensão leve e prática para extrair e baixar PDFs do Doceru.com com um clique!

Language: JavaScript - Size: 36.1 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

patrickiel/PDF-Image-Extractor

A Python tool to extract images from PDF files with filtering and organization.

Language: Python - Size: 0 Bytes - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

xiaoyao9184/docker-marker

Docker implementation of the Marker pdf to markdown

Language: Python - Size: 53.7 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

peterdey/pdftotext-dll Fork of insinfo/xpdf

PDF text extractor DLL for VB6

Language: C - Size: 223 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 1

H-Software224/khuthon_2024

Let's go khuthon in 2024!

Language: Jupyter Notebook - Size: 116 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

xiaoyao9184/docker-magic

Docker implementation of the MinerU pdf to markdown

Size: 12.7 KB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

CllsPy/PyPTE

The PDF Text Extractor API allows users to upload PDF files and receive the extracted text from those files. This API is built using FastAPI and leverages the PyMuPDF library for efficient text extraction.

Language: Python - Size: 11.7 KB - Last synced at: 2 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Jemeni11/pdfjs

Testing the capabilities of pdfjs

Language: TypeScript - Size: 139 KB - Last synced at: 6 days ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Jemeni11/reactpdf

Testing the capabilities of reactpdf

Language: TypeScript - Size: 224 KB - Last synced at: 6 days ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Eemayas/Data-Extraction-PDFs

This project provides a set of tools for extracting data from PDF files, visualizing text locations, and comparing the extracted data with ground truth data stored in CSV files. It calculates errors using Mean Absolute Error (MAE) and provides accuracy metrics for different fields.

Language: Jupyter Notebook - Size: 1.85 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

merrvve/pdf-image-extract

Command-line tool to extract and save images (JPEG, PNG) from a PDF file or all PDFs in a directory based on the specific byte signatures.

Language: Python - Size: 4.14 MB - Last synced at: 2 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

DerartuDagne/The-Complete-LangChain-LLMs-Guide Fork of PacktPublishing/The-Complete-LangChain-LLMs-Guide

This repository, forked from Packt Publishing, serves as a comprehensive guide to LangChain and LLMs, encompassing all the resources and knowledge gained from the on-demand course.

Language: Python - Size: 2.43 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

psilvautomata/Automated_PDF_Data_Processing

Data automation and processing tool designed to streamline the extraction and analysis of data from PDF's documents using MS Power Automate Desktop and Excel VBA.

Language: VBA - Size: 22.5 MB - Last synced at: about 2 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

kkew3/muconvert_rust

A thin C and Rust wrappers over `mutool convert` that extract text from pdf into in-memory buffer.

Language: C - Size: 15.6 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

GeroZayas/PDF-itemslist-extractor

Efficient tool for PDF lists items extraction to CSV conversion and CSV file merging, leveraging Python's powerful libraries.

Language: Python - Size: 265 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

ErykDarnowski/ts-test-extractor

Simple script for extracting questions, answers and so on from test PDFs (for a subject called TS I have at uni) to a more usable format.

Language: Python - Size: 44.9 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

PeterMosmans/apdfhelper

Fix links in PDF files, rewrite links, extract text annotations, remove pages

Language: Python - Size: 98.6 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Maclenn77/pdf-explainer

An Intelligent Assistant that explains the content of a PDF file. Built with ChromaDB and Langchain.

Language: Python - Size: 248 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

RichardScottOZ/geoscience_language_models Fork of NRCan/geoscience_language_models

GloVe and BERT language models re-trained using geological text.

Language: Jupyter Notebook - Size: 16.2 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

nf-n-commercial/asq-quest-extractor

CLT to automate scoring of ASQ form workflow

Language: Python - Size: 2.93 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 1

amit2014/PDF-Extractor

PDF Extractor, a powerful Python application that simplifies the extraction of highlighted text from PDF files.

Language: HTML - Size: 26.1 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

pauloofmeta/fgts-revisor

Api to calculate the FGTS revision

Language: TypeScript - Size: 9.77 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

saiedislamshuvo/pdf-splitter-tool-react

This is a simple ReactJS project that allows you to split a PDF file into separate pages, each page with a given name.

Language: CSS - Size: 422 KB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

paritoshtripathi935/Regex-PDF-Extractor

Regex-PDF-Extractor

Language: Python - Size: 41 KB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

ivaquero/pdfriend 📦

A Cross-Platform PySide6-based GUI for PyPDF (🚧 WIP)

Language: Python - Size: 11.7 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

blminami/node-js-scripts

Random scripts

Language: TypeScript - Size: 14.6 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

ktxo/pdf-extractor-demo

POC - Data extraction from PDFs invoices

Size: 369 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

kevalane/10k-extractor

Extract numbers from 10k pdf. No longer worked on bc SEC API exists.

Language: JavaScript - Size: 921 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

Aslan934/pdf_extractor

Asynchronous pdf extractor api

Language: Python - Size: 11.5 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

huda-lab/texture

A framework for data extraction over print documents that allows to construct data extraction rules over an inferred document structure.

Size: 10.9 MB - Last synced at: 11 months ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

deyvisonguilherme/extract_text

Extrator de texto de arquivos PDF

Language: C# - Size: 3.5 MB - Last synced at: about 2 years ago - Pushed at: almost 8 years ago - Stars: 0 - Forks: 0