An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: ocr-dataset

ZN1010/PEaCE

[LREC-COLING 2024] PEaCE: A Chemistry-Oriented Dataset for Optical Character Recognition on Scientific Documents. Boost OCR Performance on Scientific Documents.

Language: Python - Size: 195 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

PedroBarcha/old-books-dataset

Old book pages (with groundtruth), formerly used for OCR studies. There are several versions of the set (concerning resolution and binarization). Noised and denoised sets (done by several methods) are eventually going to be uploaded.

Language: HTML - Size: 1.29 GB - Last synced at: about 1 year ago - Pushed at: almost 8 years ago - Stars: 11 - Forks: 2

DonkeySmall/Text-Recognition-Dataset

Synthetic dataset for text recognition

Size: 33.2 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

meuzgebre/geez-ocr

A Python script for generating an OCR dataset for Geez scripts including Amharic and Tigrinya.

Language: Python - Size: 947 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 1

Halil-ibrahim-GUNBULAK/IMAGEPROCESSORS

Türkçe Haberlerin kategorize edilmesi ve Nlp kütüphanelerinin geliştirilmesi

Language: Python - Size: 20.7 MB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 3 - Forks: 0