GitHub topics: pdf-ocr-extraction
neozhu/pdfxtract
PDFxtract is a modern web application built with Next.js that allows users to upload PDF files and automatically convert each page into JPG images for easy preview and download.
Language: TypeScript - Size: 3.88 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

Polyte/OMS_OCR
This is an image/pdf OCR reader. Use it to extract text from either and image or PDF file, this project uses Tesseractjs & PDF-Parser to do OCR.
Language: TypeScript - Size: 69.9 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

sfkbstnc/pdf-extractor-cli
A professional, modular, and open-source Python command-line tool to extract data from PDFs — including plain text, tables, images, and OCR content — using best-in-class libraries like PyMuPDF, pdfplumber, and pytesseract.
Language: Python - Size: 2.24 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

VerisimilitudeX/ocr_pdf2txt
Use Optical Character Recognition technology to convert scanned PDFs into TXT files locally.
Language: Python - Size: 525 KB - Last synced at: 21 days ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

Clearedge-AI/clearedge
Build a RAG preprocessing pipeline
Language: Jupyter Notebook - Size: 24.7 MB - Last synced at: 17 days ago - Pushed at: about 1 year ago - Stars: 11 - Forks: 0

Firefox-1998/UtilityPDF
Utility with collect in one place, some operations that are normally done on PDF files.
Language: C# - Size: 55.7 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

skylander86/lambda-text-extractor
AWS Lambda functions to extract text from various binary formats.
Language: Python - Size: 111 MB - Last synced at: 7 months ago - Pushed at: over 7 years ago - Stars: 173 - Forks: 42

fsdesa/pdf-ocr-service
PDF OCR service in docker
Language: Java - Size: 64.5 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

omaxel/pdf-ocr
Recognize page content of a PDF as text using Tesseract and Ghostscript.
Language: C# - Size: 173 KB - Last synced at: 2 months ago - Pushed at: over 7 years ago - Stars: 7 - Forks: 1

mcagriaksoy/diff_merge_pdf
A tool for compare, merge, display difference and make OCR between the PDFs.
Language: Python - Size: 1.3 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

lakshay1296/OCR_Django_App_Beta
Example Django-Python project which contains OCR, PDF to OCR PDF, Text Similarity/Dissimilarity, PDF to PNG converter modules.
Language: Python - Size: 52.7 MB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 0

Achiwilms/OCR-Wizard
A powerful and user-friendly tool based on OCRmyPDF, offering a seamless GUI for conversion of image-based PDFs into searchable text.
Language: Python - Size: 1.42 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0
