Topic: "document-layout-analysis"
Layout-Parser/layout-parser
A Unified Toolkit for Deep Learning Based Document Image Analysis
Language: Python - Size: 58.3 MB - Last synced at: 25 days ago - Pushed at: 10 months ago - Stars: 5,256 - Forks: 498

deepdoctection/deepdoctection
A Repo For Document AI
Language: Python - Size: 28.2 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 2,848 - Forks: 160

tstanislawek/awesome-document-understanding
A curated list of resources for Document Understanding (DU) topic
Size: 5.56 MB - Last synced at: about 15 hours ago - Pushed at: about 2 years ago - Stars: 1,416 - Forks: 160

BobLd/DocumentLayoutAnalysis
Document Layout Analysis resources repos for development with PdfPig.
Language: C# - Size: 41.6 MB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 618 - Forks: 67

explosion/spacy-layout
📚 Process PDFs, Word documents and more with spaCy
Language: Python - Size: 2.21 MB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 612 - Forks: 41

qurator-spk/eynollah
Document Layout Analysis
Language: Python - Size: 6.01 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 376 - Forks: 31

lquirosd/P2PaLA 📦
Page to PAGE Layout Analysis Tool
Language: Python - Size: 849 KB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 191 - Forks: 42

phamquiluan/PubLayNet
ICDAR 2019: MaskRCNN on PubLayNet datasets. Paragraph detection, table detection, figure detection,...
Language: Python - Size: 626 KB - Last synced at: about 2 months ago - Pushed at: about 4 years ago - Stars: 179 - Forks: 39

hpanwar08/detectron2 Fork of facebookresearch/detectron2
Detectron2 for Document Layout Analysis
Language: Python - Size: 4.53 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 178 - Forks: 62

marieai/marie-ai
Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pipelines (GenAI, LLM, VLLM) into your applications, supporting various tasks such as document cleanup, optical character recognition (OCR), classification, splitting, named entity recognition, and form processing
Language: Python - Size: 35.7 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 70 - Forks: 8

biswassanket/DocSegTr
A Bottom-Up Instance Segmentation Strategy for segmenting document instances using Transformers
Language: Python - Size: 12.5 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 51 - Forks: 9

JPLeoRX/detectron2-publaynet
Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset
Language: Python - Size: 7.76 MB - Last synced at: 6 days ago - Pushed at: about 2 years ago - Stars: 49 - Forks: 7

BobLd/PdfPigMLNetBlockClassifier
Proof of concept of training a simple Region Classifier using PdfPig and ML.NET (LightGBM). The objective is to classify each text block in a pdf document page as either title, text, list, table and image.
Language: C# - Size: 1.1 MB - Last synced at: 5 days ago - Pushed at: about 5 years ago - Stars: 28 - Forks: 6

BobLd/simple-docstrum
A step-by-step C# implementation of the Docstrum algorithm
Language: Jupyter Notebook - Size: 898 KB - Last synced at: 5 days ago - Pushed at: over 4 years ago - Stars: 23 - Forks: 5

ihdia/BoundaryNet
BoundaryNet - A Semi-Automatic Layout Annotation Tool
Language: Python - Size: 17.8 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 20 - Forks: 5

hpanwar08/document-layout-analysis-app
Simple docker deployment of document layout analysis using detectron2
Language: JavaScript - Size: 176 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 19 - Forks: 18

BobLd/PublayNet-maskrcnn-mlnet
Using a MaskRCNN model trained on the PublayNet dataset with ML.Net in C# / .Net for Document layout analysis and page segmmentation task.
Language: C# - Size: 166 MB - Last synced at: 11 days ago - Pushed at: about 2 years ago - Stars: 17 - Forks: 3

stuartemiddleton/glosat_table_dataset
GloSAT Historical Measurement Table Dataset
Language: Python - Size: 13.9 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 9 - Forks: 0

ecomp-shONgit/olr-results
document layout analysis results
Language: HTML - Size: 769 MB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 9 - Forks: 0

BobLd/PdfPigSvmRegionClassifier
Proof of concept of a simple SVM Region Classifier using PdfPig and Accord.Net. The objective is to classify each text block in a pdf document page as either title, text, list, table and image.
Language: C# - Size: 1.13 MB - Last synced at: 5 days ago - Pushed at: almost 3 years ago - Stars: 7 - Forks: 1

Duke-Chronicle-Project/awesome-historical-newspaper-analysis
Awesome historical newspaper analysis tools and literature
Size: 6.84 KB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 7 - Forks: 0

lquirosd/Order_Relation_Operator
Learning to Sort Handwritten Text Lines in Reading Order through Estimated Binary Order Relations
Language: Python - Size: 44.9 KB - Last synced at: 2 months ago - Pushed at: almost 4 years ago - Stars: 5 - Forks: 2

qurator-spk/sbb_column_classifier
Get the number of columns for a document image
Language: Python - Size: 50.8 KB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 3 - Forks: 0

charlie6echo/VBDLDSCC
Vision Based Document Layout Detection, Segmentation and context classification using MaskRCNN on Tensorflow-Keras, PyTorch & Detectron2.
Language: Jupyter Notebook - Size: 15 MB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 1

shrikumaran/ABInBev-Hackathon
An end to end deep learning approach to extract information from shipping records
Language: Jupyter Notebook - Size: 11.9 MB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 0

EdwardNgo/Document-Layout-Detection
Project for Deep Learning and its application
Language: Jupyter Notebook - Size: 13.1 MB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 0

askintution/dhSegment Fork of dhlab-epfl/dhSegment
Generic framework for historical document processing
Size: 5.9 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 1

MansurPro/DocuParse
DocuParse is a high-performance tool for converting PDF documents into clean, structured Markdown files. Designed for speed and accuracy, it extracts and formats content while minimizing errors like hallucinations and repetitions.
Size: 3.91 KB - Last synced at: 2 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

qyhou/curated-document-layout-analysis
A curated list of resources on Document Layout Analysis
Size: 10.7 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

Ritesh1137/langchain-doc-intelligence-loader
Customized LangChain Azure Document Intelligence loader for table extraction and summarization
Language: Python - Size: 454 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0
