An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: page-xml

qurator-spk/dinglehopper

An OCR evaluation tool

Language: Python - Size: 3.64 MB - Last synced at: about 15 hours ago - Pushed at: about 15 hours ago - Stars: 65 - Forks: 16

BobLd/DocumentLayoutAnalysis

Document Layout Analysis resources repos for development with PdfPig.

Language: C# - Size: 41.6 MB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 612 - Forks: 67

UglyToad/PdfPig

Read and extract text and other content from PDFs in C# (port of PDFBox)

Language: C# - Size: 167 MB - Last synced at: about 5 hours ago - Pushed at: 2 days ago - Stars: 1,973 - Forks: 255

Lemmbraalemao-DPB/German-Brazilian-Newspapers-Dataset_1

The GBN Dataset consists German-Brazilian historical newspapers, along with their digital and binarized images and ground truth files.

Size: 5.09 GB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

mittagessen/kraken

OCR engine for all the languages

Language: Python - Size: 28.3 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 808 - Forks: 140

UB-Mannheim/blatt

NLP-helper for OCR-ed pages in PAGE XML format

Language: Python - Size: 42 KB - Last synced at: 10 days ago - Pushed at: 5 months ago - Stars: 10 - Forks: 1

slub/textract2page

Convert AWS Textract JSON to PRImA PAGE XML

Language: Python - Size: 76.9 MB - Last synced at: 11 days ago - Pushed at: 3 months ago - Stars: 6 - Forks: 3

kba/transkribus-to-prima

Convert Transkribus PAGE-XML to standard PAGE-XML

Language: Python - Size: 153 KB - Last synced at: 15 days ago - Pushed at: 10 months ago - Stars: 12 - Forks: 3

lquirosd/P2PaLA 📦

Page to PAGE Layout Analysis Tool

Language: Python - Size: 849 KB - Last synced at: 9 days ago - Pushed at: over 3 years ago - Stars: 191 - Forks: 42

UB-Mannheim/ocr-fileformat

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

Language: JavaScript - Size: 805 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 176 - Forks: 23

cneud/ocr-conversion

Conversions between various OCR formats

Size: 35.2 KB - Last synced at: 9 months ago - Pushed at: almost 2 years ago - Stars: 71 - Forks: 3

OCR-D/gt_structure_1_3

The repo gt_structure_1_3 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.

Size: 2.15 GB - Last synced at: 11 days ago - Pushed at: 10 months ago - Stars: 0 - Forks: 1

OCR-D/gt_structure_1_2

The repo gt_structure_1_2 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.

Size: 1.44 GB - Last synced at: 11 days ago - Pushed at: 10 months ago - Stars: 0 - Forks: 1

OCR-D/gt_structure_1_4

About The repo gt_structure_1_4 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.

Size: 1.8 GB - Last synced at: 11 days ago - Pushed at: 10 months ago - Stars: 1 - Forks: 1

OCR-D/gt_structure_1_1

The repo gt_structure_1_1 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.

Size: 1.22 GB - Last synced at: 11 days ago - Pushed at: 10 months ago - Stars: 0 - Forks: 1

OCR-D/gt-repo-scripts

XSLT and shell scripts for analyzing and creating GitHub pages of a ground truth repository. These are centrally managed and can be used by all repositories created with gt-repo-template (https://github.com/OCR-D/gt-repo-template).

Language: XSLT - Size: 1.17 MB - Last synced at: 11 days ago - Pushed at: 11 months ago - Stars: 0 - Forks: 2

IMAGO-Catalogues-Jjanes/cataloguesSegmentationOCR

Dataset and models for catalogs' Layout analysis and HTR

Language: Python - Size: 966 MB - Last synced at: 10 days ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 1

tboenig/German-Brazilian-Newspapers-Dataset_2

The GBN Dataset consists German-Brazilian historical newspapers, along with their digital and binarized images and ground truth files.

Size: 555 MB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

qurator-spk/ocrd_repair_inconsistencies 📦

Automatically re-order lines, words and glyphs to become textually consistent with their parents.

Language: Python - Size: 44.9 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 3

Heresta/OCR17plus

Data for layout analysis and HTR.

Language: Python - Size: 4.85 GB - Last synced at: 10 days ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 3

tboenig/gt-guidelines Fork of kba/gt-guidelines

OCR-D guidelines for Ground Truth production

Language: XSLT - Size: 156 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 1

GBN-DBP/ocrd-page-xml-draw

OCR-D wrapper for page-xml-draw

Language: Python - Size: 6.84 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

GBN-DBP/page-xml-draw

A powerful CLI tool for visualization and encoding of PAGE-XML files

Language: Python - Size: 18.4 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 6 - Forks: 1