An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: pdf-ocr-extraction

neozhu/pdfxtract

PDFxtract is a modern web application built with Next.js that allows users to upload PDF files and automatically convert each page into JPG images for easy preview and download.

Language: TypeScript - Size: 3.88 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

Polyte/OMS_OCR

This is an image/pdf OCR reader. Use it to extract text from either and image or PDF file, this project uses Tesseractjs & PDF-Parser to do OCR.

Language: TypeScript - Size: 69.9 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

sfkbstnc/pdf-extractor-cli

A professional, modular, and open-source Python command-line tool to extract data from PDFs — including plain text, tables, images, and OCR content — using best-in-class libraries like PyMuPDF, pdfplumber, and pytesseract.

Language: Python - Size: 2.24 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

VerisimilitudeX/ocr_pdf2txt

Use Optical Character Recognition technology to convert scanned PDFs into TXT files locally.

Language: Python - Size: 525 KB - Last synced at: 21 days ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

Clearedge-AI/clearedge

Build a RAG preprocessing pipeline

Language: Jupyter Notebook - Size: 24.7 MB - Last synced at: 17 days ago - Pushed at: about 1 year ago - Stars: 11 - Forks: 0

Firefox-1998/UtilityPDF

Utility with collect in one place, some operations that are normally done on PDF files.

Language: C# - Size: 55.7 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

skylander86/lambda-text-extractor

AWS Lambda functions to extract text from various binary formats.

Language: Python - Size: 111 MB - Last synced at: 7 months ago - Pushed at: over 7 years ago - Stars: 173 - Forks: 42

fsdesa/pdf-ocr-service

PDF OCR service in docker

Language: Java - Size: 64.5 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

omaxel/pdf-ocr

Recognize page content of a PDF as text using Tesseract and Ghostscript.

Language: C# - Size: 173 KB - Last synced at: 2 months ago - Pushed at: over 7 years ago - Stars: 7 - Forks: 1

mcagriaksoy/diff_merge_pdf

A tool for compare, merge, display difference and make OCR between the PDFs.

Language: Python - Size: 1.3 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

lakshay1296/OCR_Django_App_Beta

Example Django-Python project which contains OCR, PDF to OCR PDF, Text Similarity/Dissimilarity, PDF to PNG converter modules.

Language: Python - Size: 52.7 MB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 0

Achiwilms/OCR-Wizard

A powerful and user-friendly tool based on OCRmyPDF, offering a seamless GUI for conversion of image-based PDFs into searchable text.

Language: Python - Size: 1.42 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0