GitHub topics: ocr-dataset

Repositories

ZN1010/PEaCE

[LREC-COLING 2024] PEaCE: A Chemistry-Oriented Dataset for Optical Character Recognition on Scientific Documents. Boost OCR Performance on Scientific Documents.

Language: Python - Size: 195 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

Old book pages (with groundtruth), formerly used for OCR studies. There are several versions of the set (concerning resolution and binarization). Noised and denoised sets (done by several methods) are eventually going to be uploaded.

Language: HTML - Size: 1.29 GB - Last synced at: about 1 year ago - Pushed at: almost 8 years ago - Stars: 11 - Forks: 2

DonkeySmall/Text-Recognition-Dataset

Synthetic dataset for text recognition

Size: 33.2 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

meuzgebre/geez-ocr

A Python script for generating an OCR dataset for Geez scripts including Amharic and Tigrinya.

Language: Python - Size: 947 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 1

Halil-ibrahim-GUNBULAK/IMAGEPROCESSORS

Türkçe Haberlerin kategorize edilmesi ve Nlp kütüphanelerinin geliştirilmesi

Language: Python - Size: 20.7 MB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 3 - Forks: 0

Related Keywords

ocr-dataset 5 ocr 2 nlp 2 python 2 lrec-coling-2024 1 amharic-nlp-researcher 1 amharic-words 1 automation 1 geez 1 geez-script-fonts 1 ocr-data-generator 1 ocr-python 1 tigrinya-dataset 1 tigrinya-nlp 1 tigrinya-ocr 1 acikhack2 1 ai 1 classification-algorithm 1 corpus 1 new-algorithm 1 news 1 turkish-nlp 1 word2vec-model 1 ocr-recognition 1 optical-character-recognition 1 binarization 1 binarized-dataset 1 books-dataset 1 dataset 1 ground-truth 1 groundtruth 1 ocr-database 1 old-books 1 old-documents 1 text 1 text-data 1 text-database 1 text-ocr-dataset 1 text-recognition 1 text-recognition-dataset 1 amharic 1 amharic-dataset 1 amharic-nlp 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos

GitHub topics: ocr-dataset

ZN1010/PEaCE

PedroBarcha/old-books-dataset

DonkeySmall/Text-Recognition-Dataset

meuzgebre/geez-ocr

Halil-ibrahim-GUNBULAK/IMAGEPROCESSORS