GitHub topics: ocr-dataset
ZN1010/PEaCE
[LREC-COLING 2024] PEaCE: A Chemistry-Oriented Dataset for Optical Character Recognition on Scientific Documents. Boost OCR Performance on Scientific Documents.
Language: Python - Size: 195 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

PedroBarcha/old-books-dataset
Old book pages (with groundtruth), formerly used for OCR studies. There are several versions of the set (concerning resolution and binarization). Noised and denoised sets (done by several methods) are eventually going to be uploaded.
Language: HTML - Size: 1.29 GB - Last synced at: about 1 year ago - Pushed at: almost 8 years ago - Stars: 11 - Forks: 2

DonkeySmall/Text-Recognition-Dataset
Synthetic dataset for text recognition
Size: 33.2 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

meuzgebre/geez-ocr
A Python script for generating an OCR dataset for Geez scripts including Amharic and Tigrinya.
Language: Python - Size: 947 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 1

Halil-ibrahim-GUNBULAK/IMAGEPROCESSORS
Türkçe Haberlerin kategorize edilmesi ve Nlp kütüphanelerinin geliştirilmesi
Language: Python - Size: 20.7 MB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 3 - Forks: 0
