An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: pagexml

jahtz/octopy

Command line tool for Kraken text segmentation and recognition.

Language: Python - Size: 246 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

jahtz/pypxml

A python library for parsing, converting and modifying PageXML files.

Language: Python - Size: 98.6 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

mauvilsa/nw-page-editor

Simple app for visual editing of Page XML files

Language: JavaScript - Size: 2.61 MB - Last synced at: about 5 hours ago - Pushed at: 6 days ago - Stars: 30 - Forks: 9

mauvilsa/tesseract-recognize

Tool that does layout analysis and/or text recognition using tesseract and outputs the result in Page XML format

Language: C++ - Size: 188 KB - Last synced at: 2 days ago - Pushed at: about 1 month ago - Stars: 46 - Forks: 8

omni-us/pagexml

Library in C++ and a python wrapper for dealing with Page XML files

Language: C++ - Size: 6.65 MB - Last synced at: 19 days ago - Pushed at: about 1 month ago - Stars: 13 - Forks: 2

HTR-School-Vienna/2024--late-medieval-latin

Transcriptions of 15th-century Latin manuscripts (ÖNB Cod. 4680 and 4135) from the 2024/2025 HTR Winter School, following CATMuS guidelines.

Size: 1.38 MB - Last synced at: 26 days ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

cconzen/ReadingOrderRecalculation

Post-process PageXMLs to improve their region reading order

Language: Python - Size: 10.9 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 4 - Forks: 2

lectaurep/lepidemo

LECTAUREP Pipeline demonstration to TEI Publisher

Language: Jupyter Notebook - Size: 3.43 MB - Last synced at: 26 days ago - Pushed at: about 3 years ago - Stars: 4 - Forks: 2

TEI4HTR/page2tei

A repository for illustrating the transformation of a PAGE XML file into XML-TEI format, resulting from experimentations made for the LECTAUREP project.

Language: XSLT - Size: 2.19 MB - Last synced at: 24 days ago - Pushed at: almost 3 years ago - Stars: 17 - Forks: 2

SCDH/x2tei-transformations

Transformation from various Formats to TEI

Language: XSLT - Size: 325 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

jahtz/xmltools

Command line tool for working PageXML files

Language: Python - Size: 34.2 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

jahtz/pagexml

Python package for working with PageXML files

Language: Python - Size: 53.7 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

BobLd/PublayNetSharp

Extract and convert PubLayNet data to PageXml format

Language: C# - Size: 38.1 KB - Last synced at: 3 days ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

OCR-D/gt-repo-template

A template for creating a ground truth repo with the various functions and features: such as metadata creation, data analysis and presentation.

Size: 157 KB - Last synced at: 22 days ago - Pushed at: 11 months ago - Stars: 8 - Forks: 4

jahtz/tesspage

Toolset for Tesseract training with PageXML Ground-Truth

Language: Python - Size: 76.2 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

jahtz/htrtools

Small collection of HTR/PageXML related scripts used at the ZPD Würzburg

Language: Python - Size: 85.9 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

Jatzelberger/pagesearch

Search PageXML files for character sequences and copy matching files to a folder with summary file

Language: Python - Size: 12.7 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

andbue/nashi

Some bits of javascript to transcribe scanned pages using PageXML

Language: HTML - Size: 296 KB - Last synced at: 27 days ago - Pushed at: about 1 year ago - Stars: 17 - Forks: 4

tboenig/gt_corpus_benchmark

This repo provides a collection of ground truth data. The collection was compiled under different aspects (complexity of the layouts and use of the fonts). The individual data are also characterized by metadata. The metadata is based on the labeling scheme of OCR-D/PrimaLab.

Size: 25.4 KB - Last synced at: 12 days ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

Middle-High-German-Conceptual-Database/xquery-pagexml-transkribus-module

This module provides access to Transkribus PageXML files via Xquery functions. It is designed to be used in context of a Basex xml database, but should work with other xml databases as well.

Language: XQuery - Size: 23.4 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0