Topic: "alto-xml"
UglyToad/PdfPig
Read and extract text and other content from PDFs in C# (port of PDFBox)
Language: C# - Size: 168 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 2,184 - Forks: 281

mittagessen/kraken
OCR engine for all the languages
Language: Python - Size: 29.6 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 875 - Forks: 148

BobLd/DocumentLayoutAnalysis
Document Layout Analysis resources repos for development with PdfPig.
Language: C# - Size: 41.6 MB - Last synced at: about 15 hours ago - Pushed at: almost 2 years ago - Stars: 625 - Forks: 68

cneud/ocr-conversion
Conversions between various OCR formats
Size: 35.2 KB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 79 - Forks: 3

qurator-spk/dinglehopper
An OCR evaluation tool
Language: Python - Size: 3.66 MB - Last synced at: 19 days ago - Pushed at: 4 months ago - Stars: 66 - Forks: 16

dbmdz/mirador-textoverlay
Text Overlay plugin for Mirador 3
Language: JavaScript - Size: 4.31 MB - Last synced at: 29 days ago - Pushed at: 3 months ago - Stars: 57 - Forks: 15

altoxml/schema
ALTO XML schema - latest and all former versions
Size: 5.85 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 51 - Forks: 4

cneud/alto-tools
Python tools for performing various operations on ALTO XML files
Language: Python - Size: 144 KB - Last synced at: 3 days ago - Pushed at: 6 months ago - Stars: 48 - Forks: 17

kitodo/kitodo-presentation
Kitodo.Presentation is a feature-rich framework for building a METS- or IIIF-based digital library. It is part of the Kitodo Digital Library Suite.
Language: JavaScript - Size: 46.7 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 42 - Forks: 44

altomator/Image_Retrieval
Image Retrieval in Digital Libraries - A Multicollection Experimentation of Machine Learning techniques
Language: XQuery - Size: 63.6 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 26 - Forks: 5

altomator/EN-data_mining
Data Mining Historical Newspaper Metadata (METS/ALTO formats)
Language: HTML - Size: 58.9 MB - Last synced at: 4 months ago - Pushed at: about 3 years ago - Stars: 25 - Forks: 4

Living-with-machines/alto2txt
Convert ALTO XML to plain text + minimal metadata
Language: Python - Size: 32.7 MB - Last synced at: 3 days ago - Pushed at: 11 months ago - Stars: 17 - Forks: 2

natliblux/BnLMetsExporter
Command Line Interface (CLI) to export METS/ALTO documents to other formats.
Language: Java - Size: 433 KB - Last synced at: 4 months ago - Pushed at: over 3 years ago - Stars: 13 - Forks: 1

qurator-spk/mods4pandas
Extract the MODS/ALTO metadata of a bunch of METS/ALTO files into pandas DataFrames for data analysis
Language: Python - Size: 435 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 12 - Forks: 0

alix-tz/aspyre-gt
A pipeline to transfer ground truth from Transkribus to eScriptorium.
Language: Python - Size: 15.8 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 0

Jean-Baptiste-Camps/ALTEI
a bunch of scripts to manipulate ALTO and XML/TEI
Language: XSLT - Size: 142 KB - Last synced at: 6 months ago - Pushed at: about 4 years ago - Stars: 5 - Forks: 0

altomator/ALTO-IIIF
Extracting illustrations from ALTO documents with IIIF
Language: Perl - Size: 2.92 MB - Last synced at: about 1 month ago - Pushed at: over 9 years ago - Stars: 5 - Forks: 0

TheStanfordDaily/archives-web
Helper functions and web app for METS/ALTO archive viewing.
Language: JavaScript - Size: 3.08 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 2

Heresta/OCR17plus
Data for layout analysis and HTR.
Language: Python - Size: 4.85 GB - Last synced at: 3 days ago - Pushed at: about 4 years ago - Stars: 4 - Forks: 4

Haighton/create_searchable_pdf
Create a searchable PDF with ALTO-XML and JP2 files.
Language: CSS - Size: 4.42 MB - Last synced at: 12 months ago - Pushed at: almost 5 years ago - Stars: 4 - Forks: 0

altomator/ALTO-HTML
Conversion of ALTO files (including tags) to HTML
Language: HTML - Size: 154 KB - Last synced at: 4 months ago - Pushed at: over 6 years ago - Stars: 3 - Forks: 1

IMAGO-Catalogues-Jjanes/cataloguesSegmentationOCR
Dataset and models for catalogs' Layout analysis and HTR
Language: Python - Size: 966 MB - Last synced at: 3 days ago - Pushed at: about 4 years ago - Stars: 2 - Forks: 1

joliciel-informatique/jochre-alto-editor
Graphical browser-based Alto4 editor, for the construction of OCR training corpora.
Language: JavaScript - Size: 256 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

Rajasekaran85/ALTO-XML-highlighting-Application
ALTO XML coordinates highlighting application for validating the coordinates values
Language: Python - Size: 685 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

hnjm/kraken Fork of mittagessen/kraken
OCR engine for all the languages
Size: 64 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

Rajasekaran85/Python-TIFF-to-OCR-XML
TIFF Image - Converted into OCR XML using Tesseract
Language: Python - Size: 2.93 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Haighton/KB_related_stuff
Scripts I wrote at my job which could be helpful to others
Language: Python - Size: 12.7 KB - Last synced at: 12 months ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

hdaip/hdaip-scanner
HDaIP.scanner - Historical Document and Information Processing - Scanner
Size: 14.6 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 1

Aazhar/alto2others
XSL stylesheets to convert between alto and other formats (hOCR, plain text...)
Size: 0 Bytes - Last synced at: 6 months ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0
