GitHub topics: pdftotext
l2ysho/afpp
Because we all needed just one more way to deal with PDFs. Fast, efficient, minimal. Zero bloat, one dependency. Because we all needed another f*cking pdf parser.
Language: TypeScript - Size: 2.26 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 3 - Forks: 0

zetahernandez/pdf-to-text
Read pdf files on javascript
Language: JavaScript - Size: 36.1 KB - Last synced at: 7 days ago - Pushed at: over 5 years ago - Stars: 80 - Forks: 32

subba-design/ToolHub
Get 10 + free online tools on single dashboard
Language: HTML - Size: 116 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

lu4p/cat
Extract text from plaintext, .docx, .odt and .rtf files. Pure go.
Language: Go - Size: 215 KB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 102 - Forks: 16

farhan0167/BankAIAgent
A tool to convert bank statements into Excel files
Language: Python - Size: 3.43 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 9 - Forks: 4

deardurham/ciprs-reader
Python library for reading CIPRS PDFs
Language: Jupyter Notebook - Size: 1.61 MB - Last synced at: 6 days ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 8

iron-software/Iron-OCR-Image-to-Text-in-CSharp
Image to Text Tutorial in C# - See https://ironsoftware.com/csharp/ocr/tutorials/how-to-read-text-from-an-image-in-csharp-net/
Language: C# - Size: 3.8 MB - Last synced at: 4 months ago - Pushed at: almost 7 years ago - Stars: 74 - Forks: 16

ashutoshvarma/pyxpdf
Fast and memory-efficient Python PDF Parser based on xpdf sources
Language: Cython - Size: 12.2 MB - Last synced at: 21 days ago - Pushed at: over 1 year ago - Stars: 42 - Forks: 17

amenezes/aiopytesseract
A Python asyncio wrapper for Tesseract-OCR.
Language: Python - Size: 2.14 MB - Last synced at: 20 days ago - Pushed at: 11 months ago - Stars: 26 - Forks: 6

amitsuthar69/pdf2text
A pdf to text extractor web service written in Go.
Language: Go - Size: 266 KB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 3 - Forks: 1

Ananthakrishnan12/Resume-Analyzer-Using-BERT
Resume Analyzer Using BERT
Language: Python - Size: 8.4 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

icaropires/pdf2dataset
Converts a whole subdirectory with a big (or small) volume of PDF documents to a dataset (pandas DataFrame) with error tracking and choice of features
Language: Python - Size: 301 KB - Last synced at: 5 months ago - Pushed at: 8 months ago - Stars: 20 - Forks: 4

raul23/convert-to-txt
Convert documents (pdf, djvu, epub, word) to txt
Language: Python - Size: 159 KB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 13 - Forks: 2

andrealenzi11/py-poppleract
Python library and Web service based on Poppler Pdftotext utility and Tesseract OCR for extracting text from PDF documents
Language: Python - Size: 202 KB - Last synced at: 5 months ago - Pushed at: 11 months ago - Stars: 10 - Forks: 2

tmsincomb/ImageToCSV
Converts an image to a CSV. This exists because Chorus 3.0 is bat-shit and only show images for vital metadata.
Language: Python - Size: 2.53 MB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 5 - Forks: 2

joeychilson/pdftotext
A Go library for converting PDF files to text using the pdftotext utility.
Language: Go - Size: 15.6 KB - Last synced at: 5 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Anish-M-code/pdftotext
A simple pdftotext conversion tool for Windows 8.1/10/11 and FEDORA/UBUNTU/DEBIAN/ARCH based linux distros using poppler-utils and Google's tesseract-ocr.
Language: Python - Size: 778 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 13 - Forks: 3

Zeeshanahmad4/NLP-Pdf-Minning-Extracting-text-from-pdf
NLP Pdf Minning Extracting text from pdf
Language: Python - Size: 2.86 MB - Last synced at: 5 months ago - Pushed at: over 5 years ago - Stars: 3 - Forks: 1

ChanMo/docker-poppler
A simple RESTFul API service for poppler
Language: Python - Size: 3.91 KB - Last synced at: 4 months ago - Pushed at: almost 2 years ago - Stars: 5 - Forks: 1

fabriziosalmi/any-to-mp4 📦
Convert any kind of file to video.
Size: 498 KB - Last synced at: 5 days ago - Pushed at: almost 6 years ago - Stars: 1 - Forks: 1

feeling-free/pdf2text
Extract text from pdf using Tesseract-OCR
Language: Python - Size: 414 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

DrMcCoy/pdftextorizer
Interactively extract text from multi-column PDFs
Language: Python - Size: 178 KB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

euyogi/Projeto-Anceu-CS50
Meu projeto do curso CS50: Um analisador de pdfs que processa as notas dos aprovados pelo Acesso Enem e organiza tudo. Agora em C++
Language: C++ - Size: 43.2 MB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 0

dosadczuk/go-pdftotext
Wrapper for Xpdf command line tool `pdftotext`
Language: Go - Size: 27.3 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

pradeepbatchu/streamlitocr
PDF to Text with streamlit application
Language: Python - Size: 85.1 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

cccadet/leiturapdf
Language: Jupyter Notebook - Size: 8.8 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

OmarTools/PDF-Manipulator-by-OmarTools
Unlimited PDF manipulation tool (portable)
Size: 29.7 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

tecosaur/pdftotext.el
A mirror of https://git.tecosaur.net/tec/pdftotext.el
Language: Emacs Lisp - Size: 4.88 KB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 1

ExceptedPrism3/PDFToAudio
"PDF To Audio" is a Python tool that transforms PDF documents into audio files using OCR and Text-to-Speech technology. Ideal for accessibility and auditory learning, it supports multiple languages, parallel processing, and smart rate limit handling.
Language: Python - Size: 2.81 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

boettner/pdf2sandwich-pdf 📦
Convert scanned pdf into text embedded pdf.
Language: Shell - Size: 8.79 KB - Last synced at: 6 days ago - Pushed at: over 5 years ago - Stars: 4 - Forks: 1

dylan-park/InvoiceReader
(In Development) Converts invoice PDFs to JSON data, and uploads that data to FreshBooks through their API.
Language: Python - Size: 18.6 KB - Last synced at: almost 2 years ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

boomalope/misc
Growing collection of scripts that manipulate text data.
Language: Python - Size: 11.7 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

NachiketNamjoshi/ConvertToText
Python Libraries for various documents to be converted to text made simpler.
Language: Python - Size: 410 KB - Last synced at: about 2 years ago - Pushed at: over 8 years ago - Stars: 0 - Forks: 0

dotcode-moscow/pdf-api
Extract text from a PDF (pdf to text). Api for PHP/JS/Python and others.
Language: Java - Size: 35.8 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 6 - Forks: 0

pradeepbatchu/paddleocr
Image to Text with Flask application
Language: Python - Size: 22.5 KB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 1

yedhink/covid19-kerala-api-deprecated
Deprecated - A fast API service for retrieving day to day stats about Coronavirus(COVID-19, SARS-CoV-2) outbreak in Kerala(India).
Language: Go - Size: 7.96 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 13 - Forks: 4

shine-jayakumar/Extract-Data-From-PDF-In-Python
Batch-convert pdf to text, extract data from pdf in python
Language: Python - Size: 13.7 KB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 9 - Forks: 4

DataKind-BLR/covid19bharat_scrapers 📦
All scrapers for covid19
Language: Python - Size: 88.1 MB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 6

tadas-s/heroku-buildpack-pdftotext 📦
Heroku buildpack for poppler pdftotext utility
Language: Shell - Size: 1.24 MB - Last synced at: over 2 years ago - Pushed at: about 8 years ago - Stars: 1 - Forks: 15

tahaygun/PDF-to-MongoDB
This project for converting books from PDF to Proper JSON objects by separating title and content. After you take your output, you can insert your JSON file in the database easily.
Language: JavaScript - Size: 1.68 MB - Last synced at: over 2 years ago - Pushed at: over 7 years ago - Stars: 4 - Forks: 0

flyingeek/scriptable-pdfjs
A PDF to text converter for Scriptable App (iOS) working offline
Language: JavaScript - Size: 999 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 1

leonardyeoxl/PDF-to-Text-Using-OCR-Tesseract
A containerised tool to extract text from PDF file using OCR Tesseract
Language: Python - Size: 1.95 KB - Last synced at: over 2 years ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

avinxxsh/realDataOCR
Simple code to convert pdf/s to image files and use Tesseract OCR on these image files to extract text from them. This code focuses on extracting Batch No. from pharmacy bills using RegEx. None of the actual pdfs and files could be added as all data used was real life/sensitive data.
Language: Python - Size: 16.6 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 1

arnabb38/audio-book-python
Python Audio Book is a script, to convert PDF texts into Speech
Language: Python - Size: 186 KB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 0

pramodsbaviskar7/PDF2WORD
Computer application built in python to open, edit and convert a document in pdf to microsoft word format. GUI is designed using Tkinter. Opening, conversion and reading of pdf flies is carried out by a python library called PyPDF2
Language: Jupyter Notebook - Size: 155 KB - Last synced at: 7 months ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 0

Jaber-Al-Siam/Bangla-Bondhu
An Android app to assist Bangla reading
Language: Java - Size: 4.27 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

ACMCMC/usc-grades-parser
Obtener estadÃsticas de cualificaciones de la USC
Language: Python - Size: 31.3 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

zbioe/grapnel
Repository with tools for convert body in response to plain text
Language: Go - Size: 38.1 KB - Last synced at: 6 months ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

bakame-php/pdftotext
extracting texts from a pdf made easy
Language: PHP - Size: 42 KB - Last synced at: over 2 years ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 1

jefferis/paperutils
R package with utility functions to support preparation of journal articles
Language: R - Size: 588 KB - Last synced at: 6 months ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 1

views63/pdf2text
pdf to text
Language: Rust - Size: 5.86 KB - Last synced at: about 1 month ago - Pushed at: over 6 years ago - Stars: 2 - Forks: 0

daniel-007/docconv Fork of sajari/docconv
Converts PDF, DOC, DOCX, XML, HTML, RTF, etc to plain text
Language: Protocol Buffer - Size: 1.39 MB - Last synced at: over 2 years ago - Pushed at: over 8 years ago - Stars: 0 - Forks: 0

malakhovks/pdf-extract-api
Atomic Web Service (AWS, REST API) for converting PDF files to plain/text, powered by pdftotext and Node.js
Language: JavaScript - Size: 88.9 KB - Last synced at: over 2 years ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 0

milesibastos/pdf-to-text Fork of zetahernandez/pdf-to-text
Language: JavaScript - Size: 71.3 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 1
