Topic: "document-layout-analysis"
Layout-Parser/layout-parser
A Unified Toolkit for Deep Learning Based Document Image Analysis
Language: Python - Size: 58.3 MB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 5,256 - Forks: 498

deepdoctection/deepdoctection
A Repo For Document AI
Language: Python - Size: 29.1 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 2,931 - Forks: 167

tstanislawek/awesome-document-understanding
A curated list of resources for Document Understanding (DU) topic
Size: 5.56 MB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 1,453 - Forks: 163

explosion/spacy-layout
📚 Process PDFs, Word documents and more with spaCy
Language: Python - Size: 2.21 MB - Last synced at: 1 day ago - Pushed at: 6 months ago - Stars: 737 - Forks: 52

BobLd/DocumentLayoutAnalysis
Document Layout Analysis resources repos for development with PdfPig.
Language: C# - Size: 41.6 MB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 625 - Forks: 68

qurator-spk/eynollah
Document Layout Analysis
Language: Python - Size: 6.03 MB - Last synced at: 10 days ago - Pushed at: 24 days ago - Stars: 383 - Forks: 31

lquirosd/P2PaLA 📦
Page to PAGE Layout Analysis Tool
Language: Python - Size: 849 KB - Last synced at: 5 months ago - Pushed at: over 3 years ago - Stars: 191 - Forks: 42

phamquiluan/PubLayNet
ICDAR 2019: MaskRCNN on PubLayNet datasets. Paragraph detection, table detection, figure detection,...
Language: Python - Size: 626 KB - Last synced at: 23 days ago - Pushed at: over 4 years ago - Stars: 182 - Forks: 39

hpanwar08/detectron2 Fork of facebookresearch/detectron2
Detectron2 for Document Layout Analysis
Language: Python - Size: 4.53 MB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 178 - Forks: 62

marieai/marie-ai
Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pipelines (GenAI, LLM, VLLM) into your applications, supporting various tasks such as document cleanup, optical character recognition (OCR), classification, splitting, named entity recognition, and form processing
Language: Python - Size: 37 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 73 - Forks: 10

biswassanket/DocSegTr
A Bottom-Up Instance Segmentation Strategy for segmenting document instances using Transformers
Language: Python - Size: 12.5 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 51 - Forks: 9

JPLeoRX/detectron2-publaynet
Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset
Language: Python - Size: 7.76 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 49 - Forks: 7

BobLd/PdfPigMLNetBlockClassifier
Proof of concept of training a simple Region Classifier using PdfPig and ML.NET (LightGBM). The objective is to classify each text block in a pdf document page as either title, text, list, table and image.
Language: C# - Size: 1.1 MB - Last synced at: 2 days ago - Pushed at: over 5 years ago - Stars: 28 - Forks: 6

BobLd/simple-docstrum
A step-by-step C# implementation of the Docstrum algorithm
Language: Jupyter Notebook - Size: 898 KB - Last synced at: 2 days ago - Pushed at: over 4 years ago - Stars: 23 - Forks: 5

ihdia/BoundaryNet
BoundaryNet - A Semi-Automatic Layout Annotation Tool
Language: Python - Size: 17.8 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 20 - Forks: 5

hpanwar08/document-layout-analysis-app
Simple docker deployment of document layout analysis using detectron2
Language: JavaScript - Size: 176 KB - Last synced at: 4 months ago - Pushed at: almost 4 years ago - Stars: 19 - Forks: 18

BobLd/PublayNet-maskrcnn-mlnet
Using a MaskRCNN model trained on the PublayNet dataset with ML.Net in C# / .Net for Document layout analysis and page segmmentation task.
Language: C# - Size: 166 MB - Last synced at: 2 days ago - Pushed at: over 2 years ago - Stars: 17 - Forks: 3

stuartemiddleton/glosat_table_dataset
GloSAT Historical Measurement Table Dataset
Language: Python - Size: 13.9 MB - Last synced at: 5 months ago - Pushed at: 7 months ago - Stars: 9 - Forks: 0

ecomp-shONgit/olr-results
document layout analysis results
Language: HTML - Size: 769 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 9 - Forks: 0

Duke-Chronicle-Project/awesome-historical-newspaper-analysis
Awesome historical newspaper analysis tools and literature
Size: 6.84 KB - Last synced at: 7 days ago - Pushed at: over 4 years ago - Stars: 8 - Forks: 0

BobLd/PdfPigSvmRegionClassifier
Proof of concept of a simple SVM Region Classifier using PdfPig and Accord.Net. The objective is to classify each text block in a pdf document page as either title, text, list, table and image.
Language: C# - Size: 1.13 MB - Last synced at: 2 days ago - Pushed at: about 3 years ago - Stars: 7 - Forks: 1

lquirosd/Order_Relation_Operator
Learning to Sort Handwritten Text Lines in Reading Order through Estimated Binary Order Relations
Language: Python - Size: 44.9 KB - Last synced at: 5 months ago - Pushed at: about 4 years ago - Stars: 5 - Forks: 2

qyhou/curated-document-layout-analysis
A curated list of resources on Document Layout Analysis
Size: 20.5 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 4 - Forks: 0

huythai855/QuizVista
Hệ thống sinh bà i thi trắc nghiệm sỠdụng trà tuệ nhân tạo - QuizVista
Language: Python - Size: 115 MB - Last synced at: 12 days ago - Pushed at: 8 months ago - Stars: 4 - Forks: 0

qurator-spk/sbb_column_classifier
Get the number of columns for a document image
Language: Python - Size: 50.8 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 0

charlie6echo/VBDLDSCC
Vision Based Document Layout Detection, Segmentation and context classification using MaskRCNN on Tensorflow-Keras, PyTorch & Detectron2.
Language: Jupyter Notebook - Size: 15 MB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 1

shrikumaran/ABInBev-Hackathon
An end to end deep learning approach to extract information from shipping records
Language: Jupyter Notebook - Size: 11.9 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 0

EdwardNgo/Document-Layout-Detection
Project for Deep Learning and its application
Language: Jupyter Notebook - Size: 13.1 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 0

askintution/dhSegment Fork of dhlab-epfl/dhSegment
Generic framework for historical document processing
Size: 5.9 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 1

MansurPro/DocuParse
DocuParse is a high-performance tool for converting PDF documents into clean, structured Markdown files. Designed for speed and accuracy, it extracts and formats content while minimizing errors like hallucinations and repetitions.
Language: Python - Size: 121 KB - Last synced at: 29 days ago - Pushed at: 30 days ago - Stars: 1 - Forks: 0

Ritesh1137/langchain-doc-intelligence-loader
Customized LangChain Azure Document Intelligence loader for table extraction and summarization
Language: Python - Size: 454 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0
