An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: pdftotext

l2ysho/afpp

Because we all needed just one more way to deal with PDFs. Fast, efficient, minimal. Zero bloat, one dependency. Because we all needed another f*cking pdf parser.

Language: TypeScript - Size: 2.26 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 3 - Forks: 0

zetahernandez/pdf-to-text

Read pdf files on javascript

Language: JavaScript - Size: 36.1 KB - Last synced at: 7 days ago - Pushed at: over 5 years ago - Stars: 80 - Forks: 32

subba-design/ToolHub

Get 10 + free online tools on single dashboard

Language: HTML - Size: 116 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

lu4p/cat

Extract text from plaintext, .docx, .odt and .rtf files. Pure go.

Language: Go - Size: 215 KB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 102 - Forks: 16

farhan0167/BankAIAgent

A tool to convert bank statements into Excel files

Language: Python - Size: 3.43 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 9 - Forks: 4

deardurham/ciprs-reader

Python library for reading CIPRS PDFs

Language: Jupyter Notebook - Size: 1.61 MB - Last synced at: 6 days ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 8

iron-software/Iron-OCR-Image-to-Text-in-CSharp

Image to Text Tutorial in C# - See https://ironsoftware.com/csharp/ocr/tutorials/how-to-read-text-from-an-image-in-csharp-net/

Language: C# - Size: 3.8 MB - Last synced at: 4 months ago - Pushed at: almost 7 years ago - Stars: 74 - Forks: 16

ashutoshvarma/pyxpdf

Fast and memory-efficient Python PDF Parser based on xpdf sources

Language: Cython - Size: 12.2 MB - Last synced at: 21 days ago - Pushed at: over 1 year ago - Stars: 42 - Forks: 17

amenezes/aiopytesseract

A Python asyncio wrapper for Tesseract-OCR.

Language: Python - Size: 2.14 MB - Last synced at: 20 days ago - Pushed at: 11 months ago - Stars: 26 - Forks: 6

amitsuthar69/pdf2text

A pdf to text extractor web service written in Go.

Language: Go - Size: 266 KB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 3 - Forks: 1

Ananthakrishnan12/Resume-Analyzer-Using-BERT

Resume Analyzer Using BERT

Language: Python - Size: 8.4 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

icaropires/pdf2dataset

Converts a whole subdirectory with a big (or small) volume of PDF documents to a dataset (pandas DataFrame) with error tracking and choice of features

Language: Python - Size: 301 KB - Last synced at: 5 months ago - Pushed at: 8 months ago - Stars: 20 - Forks: 4

raul23/convert-to-txt

Convert documents (pdf, djvu, epub, word) to txt

Language: Python - Size: 159 KB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 13 - Forks: 2

andrealenzi11/py-poppleract

Python library and Web service based on Poppler Pdftotext utility and Tesseract OCR for extracting text from PDF documents

Language: Python - Size: 202 KB - Last synced at: 5 months ago - Pushed at: 11 months ago - Stars: 10 - Forks: 2

tmsincomb/ImageToCSV

Converts an image to a CSV. This exists because Chorus 3.0 is bat-shit and only show images for vital metadata.

Language: Python - Size: 2.53 MB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 5 - Forks: 2

joeychilson/pdftotext

A Go library for converting PDF files to text using the pdftotext utility.

Language: Go - Size: 15.6 KB - Last synced at: 5 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Anish-M-code/pdftotext

A simple pdftotext conversion tool for Windows 8.1/10/11 and FEDORA/UBUNTU/DEBIAN/ARCH based linux distros using poppler-utils and Google's tesseract-ocr.

Language: Python - Size: 778 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 13 - Forks: 3

Zeeshanahmad4/NLP-Pdf-Minning-Extracting-text-from-pdf

NLP Pdf Minning Extracting text from pdf

Language: Python - Size: 2.86 MB - Last synced at: 5 months ago - Pushed at: over 5 years ago - Stars: 3 - Forks: 1

ChanMo/docker-poppler

A simple RESTFul API service for poppler

Language: Python - Size: 3.91 KB - Last synced at: 4 months ago - Pushed at: almost 2 years ago - Stars: 5 - Forks: 1

fabriziosalmi/any-to-mp4 📦

Convert any kind of file to video.

Size: 498 KB - Last synced at: 5 days ago - Pushed at: almost 6 years ago - Stars: 1 - Forks: 1

feeling-free/pdf2text

Extract text from pdf using Tesseract-OCR

Language: Python - Size: 414 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

DrMcCoy/pdftextorizer

Interactively extract text from multi-column PDFs

Language: Python - Size: 178 KB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

euyogi/Projeto-Anceu-CS50

Meu projeto do curso CS50: Um analisador de pdfs que processa as notas dos aprovados pelo Acesso Enem e organiza tudo. Agora em C++

Language: C++ - Size: 43.2 MB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 0

dosadczuk/go-pdftotext

Wrapper for Xpdf command line tool `pdftotext`

Language: Go - Size: 27.3 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

pradeepbatchu/streamlitocr

PDF to Text with streamlit application

Language: Python - Size: 85.1 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

cccadet/leiturapdf

Language: Jupyter Notebook - Size: 8.8 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

OmarTools/PDF-Manipulator-by-OmarTools

Unlimited PDF manipulation tool (portable)

Size: 29.7 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

tecosaur/pdftotext.el

A mirror of https://git.tecosaur.net/tec/pdftotext.el

Language: Emacs Lisp - Size: 4.88 KB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 1

ExceptedPrism3/PDFToAudio

"PDF To Audio" is a Python tool that transforms PDF documents into audio files using OCR and Text-to-Speech technology. Ideal for accessibility and auditory learning, it supports multiple languages, parallel processing, and smart rate limit handling.

Language: Python - Size: 2.81 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

boettner/pdf2sandwich-pdf 📦

Convert scanned pdf into text embedded pdf.

Language: Shell - Size: 8.79 KB - Last synced at: 6 days ago - Pushed at: over 5 years ago - Stars: 4 - Forks: 1

dylan-park/InvoiceReader

(In Development) Converts invoice PDFs to JSON data, and uploads that data to FreshBooks through their API.

Language: Python - Size: 18.6 KB - Last synced at: almost 2 years ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

boomalope/misc

Growing collection of scripts that manipulate text data.

Language: Python - Size: 11.7 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

NachiketNamjoshi/ConvertToText

Python Libraries for various documents to be converted to text made simpler.

Language: Python - Size: 410 KB - Last synced at: about 2 years ago - Pushed at: over 8 years ago - Stars: 0 - Forks: 0

dotcode-moscow/pdf-api

Extract text from a PDF (pdf to text). Api for PHP/JS/Python and others.

Language: Java - Size: 35.8 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 6 - Forks: 0

pradeepbatchu/paddleocr

Image to Text with Flask application

Language: Python - Size: 22.5 KB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 1

yedhink/covid19-kerala-api-deprecated

Deprecated - A fast API service for retrieving day to day stats about Coronavirus(COVID-19, SARS-CoV-2) outbreak in Kerala(India).

Language: Go - Size: 7.96 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 13 - Forks: 4

shine-jayakumar/Extract-Data-From-PDF-In-Python

Batch-convert pdf to text, extract data from pdf in python

Language: Python - Size: 13.7 KB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 9 - Forks: 4

DataKind-BLR/covid19bharat_scrapers 📦

All scrapers for covid19

Language: Python - Size: 88.1 MB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 6

tadas-s/heroku-buildpack-pdftotext 📦

Heroku buildpack for poppler pdftotext utility

Language: Shell - Size: 1.24 MB - Last synced at: over 2 years ago - Pushed at: about 8 years ago - Stars: 1 - Forks: 15

tahaygun/PDF-to-MongoDB

This project for converting books from PDF to Proper JSON objects by separating title and content. After you take your output, you can insert your JSON file in the database easily.

Language: JavaScript - Size: 1.68 MB - Last synced at: over 2 years ago - Pushed at: over 7 years ago - Stars: 4 - Forks: 0

flyingeek/scriptable-pdfjs

A PDF to text converter for Scriptable App (iOS) working offline

Language: JavaScript - Size: 999 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 1

leonardyeoxl/PDF-to-Text-Using-OCR-Tesseract

A containerised tool to extract text from PDF file using OCR Tesseract

Language: Python - Size: 1.95 KB - Last synced at: over 2 years ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

avinxxsh/realDataOCR

Simple code to convert pdf/s to image files and use Tesseract OCR on these image files to extract text from them. This code focuses on extracting Batch No. from pharmacy bills using RegEx. None of the actual pdfs and files could be added as all data used was real life/sensitive data.

Language: Python - Size: 16.6 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 1

arnabb38/audio-book-python

Python Audio Book is a script, to convert PDF texts into Speech

Language: Python - Size: 186 KB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 0

pramodsbaviskar7/PDF2WORD

Computer application built in python to open, edit and convert a document in pdf to microsoft word format. GUI is designed using Tkinter. Opening, conversion and reading of pdf flies is carried out by a python library called PyPDF2

Language: Jupyter Notebook - Size: 155 KB - Last synced at: 7 months ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 0

Jaber-Al-Siam/Bangla-Bondhu

An Android app to assist Bangla reading

Language: Java - Size: 4.27 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

ACMCMC/usc-grades-parser

Obtener estadísticas de cualificaciones de la USC

Language: Python - Size: 31.3 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

zbioe/grapnel

Repository with tools for convert body in response to plain text

Language: Go - Size: 38.1 KB - Last synced at: 6 months ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

bakame-php/pdftotext

extracting texts from a pdf made easy

Language: PHP - Size: 42 KB - Last synced at: over 2 years ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 1

jefferis/paperutils

R package with utility functions to support preparation of journal articles

Language: R - Size: 588 KB - Last synced at: 6 months ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 1

views63/pdf2text

pdf to text

Language: Rust - Size: 5.86 KB - Last synced at: about 1 month ago - Pushed at: over 6 years ago - Stars: 2 - Forks: 0

daniel-007/docconv Fork of sajari/docconv

Converts PDF, DOC, DOCX, XML, HTML, RTF, etc to plain text

Language: Protocol Buffer - Size: 1.39 MB - Last synced at: over 2 years ago - Pushed at: over 8 years ago - Stars: 0 - Forks: 0

malakhovks/pdf-extract-api

Atomic Web Service (AWS, REST API) for converting PDF files to plain/text, powered by pdftotext and Node.js

Language: JavaScript - Size: 88.9 KB - Last synced at: over 2 years ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 0

milesibastos/pdf-to-text Fork of zetahernandez/pdf-to-text

Language: JavaScript - Size: 71.3 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 1