An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: scanned-image-pdfs

cseas/ocr-table

Extract tables from scanned image PDFs using Optical Character Recognition.

Language: Python - Size: 12.8 MB - Last synced at: 17 days ago - Pushed at: almost 5 years ago - Stars: 273 - Forks: 67

karolzak/boxdetect

BoxDetect is a Python package based on OpenCV which allows you to easily detect rectangular shapes like character or checkbox boxes on scanned forms.

Language: Python - Size: 7.43 MB - Last synced at: 19 days ago - Pushed at: over 2 years ago - Stars: 109 - Forks: 20

sxaxmz/handle_scanned_pdf

A wrapper on top of python-OCR tools such as pytesseract and easyocr, to recognize and extract text embedded in images. Also, convert scanned-PDFs to text searchable PDFs.

Language: Python - Size: 811 KB - Last synced at: 9 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

boomalope/misc

Growing collection of scripts that manipulate text data.

Language: Python - Size: 11.7 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

timberger/Searchable-Image-PDF-Creat-O-Mat

This batch script creates a searchable PDF of a PDF with one or more scanned pages which contain images.

Language: Batchfile - Size: 28.3 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 0

rbrito/pkg-pdfbeads

Debian packaging of pdfbeads

Language: Ruby - Size: 76.2 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0