GitHub topics: page-xml
qurator-spk/dinglehopper
An OCR evaluation tool
Language: Python - Size: 3.64 MB - Last synced at: about 15 hours ago - Pushed at: about 15 hours ago - Stars: 65 - Forks: 16

BobLd/DocumentLayoutAnalysis
Document Layout Analysis resources repos for development with PdfPig.
Language: C# - Size: 41.6 MB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 612 - Forks: 67

UglyToad/PdfPig
Read and extract text and other content from PDFs in C# (port of PDFBox)
Language: C# - Size: 167 MB - Last synced at: about 5 hours ago - Pushed at: 2 days ago - Stars: 1,973 - Forks: 255

Lemmbraalemao-DPB/German-Brazilian-Newspapers-Dataset_1
The GBN Dataset consists German-Brazilian historical newspapers, along with their digital and binarized images and ground truth files.
Size: 5.09 GB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

mittagessen/kraken
OCR engine for all the languages
Language: Python - Size: 28.3 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 808 - Forks: 140

UB-Mannheim/blatt
NLP-helper for OCR-ed pages in PAGE XML format
Language: Python - Size: 42 KB - Last synced at: 10 days ago - Pushed at: 5 months ago - Stars: 10 - Forks: 1

slub/textract2page
Convert AWS Textract JSON to PRImA PAGE XML
Language: Python - Size: 76.9 MB - Last synced at: 11 days ago - Pushed at: 3 months ago - Stars: 6 - Forks: 3

kba/transkribus-to-prima
Convert Transkribus PAGE-XML to standard PAGE-XML
Language: Python - Size: 153 KB - Last synced at: 15 days ago - Pushed at: 10 months ago - Stars: 12 - Forks: 3

lquirosd/P2PaLA 📦
Page to PAGE Layout Analysis Tool
Language: Python - Size: 849 KB - Last synced at: 9 days ago - Pushed at: over 3 years ago - Stars: 191 - Forks: 42

UB-Mannheim/ocr-fileformat
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
Language: JavaScript - Size: 805 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 176 - Forks: 23

cneud/ocr-conversion
Conversions between various OCR formats
Size: 35.2 KB - Last synced at: 9 months ago - Pushed at: almost 2 years ago - Stars: 71 - Forks: 3

OCR-D/gt_structure_1_3
The repo gt_structure_1_3 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.
Size: 2.15 GB - Last synced at: 11 days ago - Pushed at: 10 months ago - Stars: 0 - Forks: 1

OCR-D/gt_structure_1_2
The repo gt_structure_1_2 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.
Size: 1.44 GB - Last synced at: 11 days ago - Pushed at: 10 months ago - Stars: 0 - Forks: 1

OCR-D/gt_structure_1_4
About The repo gt_structure_1_4 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.
Size: 1.8 GB - Last synced at: 11 days ago - Pushed at: 10 months ago - Stars: 1 - Forks: 1

OCR-D/gt_structure_1_1
The repo gt_structure_1_1 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.
Size: 1.22 GB - Last synced at: 11 days ago - Pushed at: 10 months ago - Stars: 0 - Forks: 1

OCR-D/gt-repo-scripts
XSLT and shell scripts for analyzing and creating GitHub pages of a ground truth repository. These are centrally managed and can be used by all repositories created with gt-repo-template (https://github.com/OCR-D/gt-repo-template).
Language: XSLT - Size: 1.17 MB - Last synced at: 11 days ago - Pushed at: 11 months ago - Stars: 0 - Forks: 2

IMAGO-Catalogues-Jjanes/cataloguesSegmentationOCR
Dataset and models for catalogs' Layout analysis and HTR
Language: Python - Size: 966 MB - Last synced at: 10 days ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 1

tboenig/German-Brazilian-Newspapers-Dataset_2
The GBN Dataset consists German-Brazilian historical newspapers, along with their digital and binarized images and ground truth files.
Size: 555 MB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

qurator-spk/ocrd_repair_inconsistencies 📦
Automatically re-order lines, words and glyphs to become textually consistent with their parents.
Language: Python - Size: 44.9 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 3

Heresta/OCR17plus
Data for layout analysis and HTR.
Language: Python - Size: 4.85 GB - Last synced at: 10 days ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 3

tboenig/gt-guidelines Fork of kba/gt-guidelines
OCR-D guidelines for Ground Truth production
Language: XSLT - Size: 156 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 1

GBN-DBP/ocrd-page-xml-draw
OCR-D wrapper for page-xml-draw
Language: Python - Size: 6.84 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

GBN-DBP/page-xml-draw
A powerful CLI tool for visualization and encoding of PAGE-XML files
Language: Python - Size: 18.4 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 6 - Forks: 1
