An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: document-image-processing

Unstructured-IO/unstructured

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

Language: HTML - Size: 192 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 11,380 - Forks: 948

Layout-Parser/layout-parser

A Unified Toolkit for Deep Learning Based Document Image Analysis

Language: Python - Size: 58.3 MB - Last synced at: 19 days ago - Pushed at: 10 months ago - Stars: 5,256 - Forks: 498

jiangnanboy/Doc-Image-Tool

文档图像处理工具(Document image processing tool),包括漂白 / 文字方向矫正 / 清晰增强 / 笔记去噪美化 / 去阴影 / 扭曲矫正 / 切边增强(DocBleach / TextOrientationCorrection / DocSharpening / HandwritingDenoisingBeautifying / DocShadowRemoval / document_image_dewarping / DocTrimmingEnhancement)。

Language: Python - Size: 11.7 MB - Last synced at: 21 days ago - Pushed at: 9 months ago - Stars: 52 - Forks: 10

fh2019ustc/Awesome-Document-Image-Rectification

A comprehensive list of awesome document image rectification papers.

Size: 188 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 418 - Forks: 30

fh2019ustc/DocScanner

The official repo for “DocScanner: Robust Document Image Rectification with Progressive Learning”, IJCV, 2025.

Language: Python - Size: 17.3 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 193 - Forks: 18

fh2019ustc/DocGeoNet

The official code for “Geometric Representation Learning for Document Image Rectification”, ECCV, 2022.

Language: Python - Size: 10.5 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 81 - Forks: 2

jiangnanboy/docimg_tool

复杂背景图像漂白,文字方向矫正,清晰增强,笔记去噪美化,去阴影,扭曲矫正,去黑点以及切边增强。complex background image bleaching, text direction correction, clarity enhancement, note to blur beautification, shadow removal, distortion correction, black spots removal and cutting edge enhancement。

Size: 9.17 MB - Last synced at: 21 days ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

mx3123/Py-document-cropper

This script automates the process of extracting text from various file formats (images, PDFs, DOCX) using Optical Character Recognition (OCR) powered by Azure Cognitive Services. The script supports image preprocessing, text extraction, and uploading of the processed files to Google Cloud Storage (GCP).

Language: Python - Size: 9.77 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

tony-xlh/quality-evaluation-of-scanned-document-images

A web app evaluating the quality the scanned document images

Language: HTML - Size: 18.6 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 1

caltechlibrary/documentarist

Process Caltech Archives' digital documents and photos, and annotate each page or image with information about its contents

Language: Python - Size: 519 KB - Last synced at: about 2 months ago - Pushed at: about 3 years ago - Stars: 12 - Forks: 4

fh2019ustc/DocTr

The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.

Language: Python - Size: 50.7 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 340 - Forks: 48

jchazalon/smartdoc15-ch1-pywrapper

Python wrapper to facilitate data manipulation for the SmartDoc 2015 - Challenge 1 Dataset.

Language: Jupyter Notebook - Size: 6.11 MB - Last synced at: 4 days ago - Pushed at: 12 months ago - Stars: 6 - Forks: 2

hpanwar08/detectron2 Fork of facebookresearch/detectron2

Detectron2 for Document Layout Analysis

Language: Python - Size: 4.53 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 178 - Forks: 62

YuanSiping/Similar-Document-Image-Retrieval-Dataset

Size: 321 MB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 3

Transkribus/competitions

The ScriptNet / competitions site.

Language: Python - Size: 263 MB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 6

Nomiluks/Handwritting-OCR

Android App for English Handwritten Text Recognition

Language: Java - Size: 67.8 MB - Last synced at: over 2 years ago - Pushed at: over 7 years ago - Stars: 12 - Forks: 5

sfikas/sophia-trikoupi-handwritten-dataset

Sophia Trikoupi dataset (Collection of 46 handwritten, annotated pages)

Language: Python - Size: 70.1 MB - Last synced at: 3 months ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0