GitHub topics: pdf-ocr-extraction

Repositories

neozhu/pdfxtract

PDFxtract is a modern web application built with Next.js that allows users to upload PDF files and automatically convert each page into JPG images for easy preview and download.

Language: TypeScript - Size: 3.89 MB - Last synced at: 9 days ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

Polyte/OMS_OCR

This is an image/pdf OCR reader. Use it to extract text from either and image or PDF file, this project uses Tesseractjs & PDF-Parser to do OCR.

Language: TypeScript - Size: 69.9 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

A professional, modular, and open-source Python command-line tool to extract data from PDFs — including plain text, tables, images, and OCR content — using best-in-class libraries like PyMuPDF, pdfplumber, and pytesseract.

Language: Python - Size: 2.24 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

skylander86/lambda-text-extractor

AWS Lambda functions to extract text from various binary formats.

Language: Python - Size: 111 MB - Last synced at: 2 months ago - Pushed at: over 7 years ago - Stars: 177 - Forks: 44

VerisimilitudeX/ocr_pdf2txt

Use Optical Character Recognition technology to convert scanned PDFs into TXT files locally.

Language: Python - Size: 525 KB - Last synced at: 6 days ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

Clearedge-AI/clearedge

Build a RAG preprocessing pipeline

Language: Jupyter Notebook - Size: 24.7 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 11 - Forks: 0

Firefox-1998/UtilityPDF

Utility with collect in one place, some operations that are normally done on PDF files.

Language: C# - Size: 55.7 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

fsdesa/pdf-ocr-service

PDF OCR service in docker

Language: Java - Size: 64.5 KB - Last synced at: 6 months ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

omaxel/pdf-ocr

Recognize page content of a PDF as text using Tesseract and Ghostscript.

Language: C# - Size: 173 KB - Last synced at: 5 months ago - Pushed at: over 7 years ago - Stars: 7 - Forks: 1

mcagriaksoy/diff_merge_pdf

A tool for compare, merge, display difference and make OCR between the PDFs.

Language: Python - Size: 1.3 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

lakshay1296/OCR_Django_App_Beta

Example Django-Python project which contains OCR, PDF to OCR PDF, Text Similarity/Dissimilarity, PDF to PNG converter modules.

Language: Python - Size: 52.7 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

Achiwilms/OCR-Wizard

A powerful and user-friendly tool based on OCRmyPDF, offering a seamless GUI for conversion of image-based PDFs into searchable text.

Language: Python - Size: 1.42 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Related Keywords

pdf-ocr-extraction 12 pdf 7 ocr 6 ocr-recognition 5 ocr-python 3 pdf-converter 2 csharp 2 pdf-document-processor 2 tesseract-ocr 2 pdf-ocr 2 pdf-viewer 2 python 2 java 1 ghostscript 1 diff-tool 1 diff-tool-pdf 1 ocr-text-reader 1 pdf-comparison 1 factura-afip 1 docker 1 afip 1 utility 1 rtf 1 pdf-merge 1 pdf-compression 1 ai-sdk 1 pdf-generator 1 pdf-merger 1 pdf-visual-testing 1 pymupdf-fitz 1 pyqt6-desktop-application 1 x-ray-images 1 django-application 1 django-project 1 html-css-javascript 1 imagemagick 1 python27 1 ocr-pdf 1 ocrmypdf 1 searchable-pdf 1 image-processing 1 nextjs 1 nodejs 1 pdf-parser 1 react 1 pdf-extractor 1 python-ocr 1 python-pdf 1 aws-lambda 1 lambda-functions 1 searchable-pdfs 1 tesseract 1 text-extraction 1 document-parser 1 haystack 1 langchain 1 llamaindex 1 llm 1 pdf-to-json 1 pdf-to-text 1 rag-pipeline 1 retrieval-augmented-generation 1 table-detection 1 table-recognition 1 compress 1 convert 1 docx 1 merge 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos