GitHub topics: scanned-image-pdfs

Repositories

cseas/ocr-table

Extract tables from scanned image PDFs using Optical Character Recognition.

Language: Python - Size: 12.8 MB - Last synced at: 17 days ago - Pushed at: almost 5 years ago - Stars: 273 - Forks: 67

karolzak/boxdetect

BoxDetect is a Python package based on OpenCV which allows you to easily detect rectangular shapes like character or checkbox boxes on scanned forms.

Language: Python - Size: 7.43 MB - Last synced at: 19 days ago - Pushed at: over 2 years ago - Stars: 109 - Forks: 20

sxaxmz/handle_scanned_pdf

A wrapper on top of python-OCR tools such as pytesseract and easyocr, to recognize and extract text embedded in images. Also, convert scanned-PDFs to text searchable PDFs.

Language: Python - Size: 811 KB - Last synced at: 9 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

boomalope/misc

Growing collection of scripts that manipulate text data.

Language: Python - Size: 11.7 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

timberger/Searchable-Image-PDF-Creat-O-Mat

This batch script creates a searchable PDF of a PDF with one or more scanned pages which contain images.

Language: Batchfile - Size: 28.3 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 0

rbrito/pkg-pdfbeads

Debian packaging of pdfbeads

Language: Ruby - Size: 76.2 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

Related Keywords

scanned-image-pdfs 6 ocr 3 scanned-documents 3 searchable-pdf 2 ocr-python 2 tesseract 2 pdf 2 batch 1 twitter 1 textual-analysis 1 tagging-tool 1 preprocessing-data 1 pdftotext 1 pdftoimage 1 parallel-processing 1 ngrams 1 memory-management 1 manual-annotations 1 jupyter-notebook 1 extract-tables 1 batch-script 1 converter 1 drag 1 drop 1 ghostscript 1 ghostscript-wrapper 1 imagemagick 1 imagemagick-wrapper 1 scan 1 scanned-pages 1 searchable-pdfs 1 tesseract-wrapper 1 pdf-converter 1 pdf-generation 1 scanning 1 ocr-table 1 optical-character-recognition 1 pdfminer 1 python 1 shell 1 bounding-boxes 1 box-detection 1 boxes 1 checkbox 1 checkboxes 1 computer-vision 1 cv2 1 documents 1 forms 1 handwritten-character-recognition 1 handwritten-characters 1 handwritten-documents 1 handwritten-forms 1 opencv 1 opencv-python 1 rectangle-detection 1 scanned-images 1 easyocr 1 extract-text-from-image 1 extract-text-from-pdf 1 pytesseract 1 scanned-pdf-documents 1 tesseract-ocr 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos