GitHub topics: hocr
UglyToad/PdfPig
Read and extract text and other content from PDFs in C# (port of PDFBox)
Language: C# - Size: 168 MB - Last synced at: 3 days ago - Pushed at: 6 days ago - Stars: 2,012 - Forks: 258

mittagessen/kraken
OCR engine for all the languages
Language: Python - Size: 28.3 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 826 - Forks: 142

stefan6419846/hocr-tools Fork of ocropus/hocr-tools
Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.
Language: Python - Size: 1.74 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 2 - Forks: 0

manisandro/gImageReader
A Gtk/Qt front-end to tesseract-ocr.
Language: C++ - Size: 11.5 MB - Last synced at: 6 days ago - Pushed at: 14 days ago - Stars: 1,754 - Forks: 203

darkn3to/pdfocr
A simple Spring Boot application to convert image-based PDFs to text-embedded PDFs.
Language: Java - Size: 1.26 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

BobLd/DocumentLayoutAnalysis
Document Layout Analysis resources repos for development with PdfPig.
Language: C# - Size: 41.6 MB - Last synced at: 8 days ago - Pushed at: over 1 year ago - Stars: 611 - Forks: 67

UB-Mannheim/ocr-fileformat
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
Language: JavaScript - Size: 813 KB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 188 - Forks: 24

dbmdz/mirador-textoverlay
Text Overlay plugin for Mirador 3
Language: JavaScript - Size: 4.31 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 54 - Forks: 15

hansalemaos/tesseract_hocr_to_csv
Fast hocr to csv parser
Language: C++ - Size: 32.2 KB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

UB-Mannheim/ocr-gt-tools
Ergonomic line-by-line transcription of scanned text.
Language: JavaScript - Size: 6.22 MB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 51 - Forks: 11

GeReV/HocrEditor
A visual editor for .hocr files.
Language: C# - Size: 5.57 MB - Last synced at: 26 days ago - Pushed at: 3 months ago - Stars: 4 - Forks: 1

GeReV/hocr-editor-ts 📦
A visual hOCR file editor
Language: TypeScript - Size: 4.3 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 10 - Forks: 6

filak/hOCR-to-ALTO
Convert between Tesseract hOCR and ALTO XML using XSL stylesheets
Language: XSLT - Size: 156 KB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 55 - Forks: 14

macabeus/pyslibtesseract
✏️ Integration of Tesseract for Python using a shared library
Language: Python - Size: 33.2 KB - Last synced at: 10 days ago - Pushed at: about 9 years ago - Stars: 12 - Forks: 2

Ansh420/Hocr_Preservation-in-pytesseract
Hocr is a format for OCR output that preserves the layout of the original document, and Pytesseract can output text in this format.
Language: Jupyter Notebook - Size: 1.54 MB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

cneud/ocr-conversion
Conversions between various OCR formats
Size: 35.2 KB - Last synced at: 10 months ago - Pushed at: about 2 years ago - Stars: 71 - Forks: 3

dmi3kno/hocr
Text-to-tibble
Language: R - Size: 2.08 MB - Last synced at: 6 days ago - Pushed at: about 5 years ago - Stars: 36 - Forks: 2

hadro/brewery-guides
The data for guides to breweries across the United States from 1896 to 1918
Size: 192 MB - Last synced at: 2 months ago - Pushed at: about 8 years ago - Stars: 3 - Forks: 0

mikeduglas/tesseract
tesseract OCR for Clarion
Language: Clarion - Size: 14.5 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

hdaip/hdaip-scanner
HDaIP.scanner - Historical Document and Information Processing - Scanner
Size: 14.6 KB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 1

fakabbir/OCR
Probabilistic Key Value pair extraction using word weights from Invoices - Non Searchable PDF
Language: Python - Size: 138 KB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 17 - Forks: 2

nuxeo-sandbox/nuxeo-platform-hocr
Perform OCR on images within Nuxeo with Tesseract and hOCR
Language: Java - Size: 2.64 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 1

trufanov-nok/tesseract2djvused
A simple Tesseract 3.02+ hOCR to djvused format converter written in Qt
Language: C++ - Size: 21.5 KB - Last synced at: 2 months ago - Pushed at: almost 4 years ago - Stars: 2 - Forks: 2

iilei/hocr-to-json
Language: JavaScript - Size: 426 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

mayurcybercz/AI-Exam-evaluation
CLI-Tool to recognise handwritten text from answer sheets using Tesseract OCR. Using this extracted text to evaluate marks using NLP
Language: Jupyter Notebook - Size: 3.01 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 3 - Forks: 0

hadro/new-york-city-directories
Some basic data and text extraction from the New York City Directories
Size: 13.2 MB - Last synced at: 2 months ago - Pushed at: almost 8 years ago - Stars: 4 - Forks: 2

ImageProcessing-ElectronicPublications/hocr-tools
Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.
Language: Python - Size: 144 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

ImageProcessing-ElectronicPublications/tesseract Fork of tesseract-ocr/tesseract
Tesseract Open Source OCR Engine (main repository)
Language: C++ - Size: 16.1 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

hnjm/kraken Fork of mittagessen/kraken
OCR engine for all the languages
Size: 64 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

emmeryn/hocr-turtletext
A gem that parses positional text from hOCR output and provides convenience methods to find text.
Language: Ruby - Size: 17.6 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

jlieth/hocr-parser
Python parser for hOCR files using lxml
Language: Python - Size: 85.9 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 1

Rajasekaran85/Python-TIFF-to-OCR-XML
TIFF Image - Converted into OCR XML using Tesseract
Language: Python - Size: 2.93 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

hadro/Wilson-Business-Directories
The data for two editions of the Wilson New York City Business Directories, 1852-1853, and 1861-1862.
Size: 166 MB - Last synced at: 2 months ago - Pushed at: over 8 years ago - Stars: 0 - Forks: 0

z4y4ts/hocr-bboxes-viewer
Quick and dirty visualization of HOCR bboxes on a page
Language: HTML - Size: 77.1 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

ZeinabTaghavi/Handwriting_Manuscript_Line_and_Segment_Setection_Then_Storage
Language: Python - Size: 41.9 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 1

ZeinabTaghavi/opencv-python
some segment codes using in denoising
Language: Python - Size: 121 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

ansonl/flyspacea-backend
Fly Space-A Facebook flight schedule photo aggregator and processor back-end server.
Language: Go - Size: 45.8 MB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 2 - Forks: 1
