An open API service providing repository metadata for many open source software ecosystems.

Topic: "document-layout-analysis"

Layout-Parser/layout-parser

A Unified Toolkit for Deep Learning Based Document Image Analysis

Language: Python - Size: 58.3 MB - Last synced at: 25 days ago - Pushed at: 10 months ago - Stars: 5,256 - Forks: 498

deepdoctection/deepdoctection

A Repo For Document AI

Language: Python - Size: 28.2 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 2,848 - Forks: 160

tstanislawek/awesome-document-understanding

A curated list of resources for Document Understanding (DU) topic

Size: 5.56 MB - Last synced at: about 15 hours ago - Pushed at: about 2 years ago - Stars: 1,416 - Forks: 160

BobLd/DocumentLayoutAnalysis

Document Layout Analysis resources repos for development with PdfPig.

Language: C# - Size: 41.6 MB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 618 - Forks: 67

explosion/spacy-layout

📚 Process PDFs, Word documents and more with spaCy

Language: Python - Size: 2.21 MB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 612 - Forks: 41

qurator-spk/eynollah

Document Layout Analysis

Language: Python - Size: 6.01 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 376 - Forks: 31

lquirosd/P2PaLA 📦

Page to PAGE Layout Analysis Tool

Language: Python - Size: 849 KB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 191 - Forks: 42

phamquiluan/PubLayNet

ICDAR 2019: MaskRCNN on PubLayNet datasets. Paragraph detection, table detection, figure detection,...

Language: Python - Size: 626 KB - Last synced at: about 2 months ago - Pushed at: about 4 years ago - Stars: 179 - Forks: 39

hpanwar08/detectron2 Fork of facebookresearch/detectron2

Detectron2 for Document Layout Analysis

Language: Python - Size: 4.53 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 178 - Forks: 62

marieai/marie-ai

Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pipelines (GenAI, LLM, VLLM) into your applications, supporting various tasks such as document cleanup, optical character recognition (OCR), classification, splitting, named entity recognition, and form processing

Language: Python - Size: 35.7 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 70 - Forks: 8

biswassanket/DocSegTr

A Bottom-Up Instance Segmentation Strategy for segmenting document instances using Transformers

Language: Python - Size: 12.5 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 51 - Forks: 9

JPLeoRX/detectron2-publaynet

Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset

Language: Python - Size: 7.76 MB - Last synced at: 6 days ago - Pushed at: about 2 years ago - Stars: 49 - Forks: 7

BobLd/PdfPigMLNetBlockClassifier

Proof of concept of training a simple Region Classifier using PdfPig and ML.NET (LightGBM). The objective is to classify each text block in a pdf document page as either title, text, list, table and image.

Language: C# - Size: 1.1 MB - Last synced at: 5 days ago - Pushed at: about 5 years ago - Stars: 28 - Forks: 6

BobLd/simple-docstrum

A step-by-step C# implementation of the Docstrum algorithm

Language: Jupyter Notebook - Size: 898 KB - Last synced at: 5 days ago - Pushed at: over 4 years ago - Stars: 23 - Forks: 5

ihdia/BoundaryNet

BoundaryNet - A Semi-Automatic Layout Annotation Tool

Language: Python - Size: 17.8 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 20 - Forks: 5

hpanwar08/document-layout-analysis-app

Simple docker deployment of document layout analysis using detectron2

Language: JavaScript - Size: 176 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 19 - Forks: 18

BobLd/PublayNet-maskrcnn-mlnet

Using a MaskRCNN model trained on the PublayNet dataset with ML.Net in C# / .Net for Document layout analysis and page segmmentation task.

Language: C# - Size: 166 MB - Last synced at: 11 days ago - Pushed at: about 2 years ago - Stars: 17 - Forks: 3

stuartemiddleton/glosat_table_dataset

GloSAT Historical Measurement Table Dataset

Language: Python - Size: 13.9 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 9 - Forks: 0

ecomp-shONgit/olr-results

document layout analysis results

Language: HTML - Size: 769 MB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 9 - Forks: 0

BobLd/PdfPigSvmRegionClassifier

Proof of concept of a simple SVM Region Classifier using PdfPig and Accord.Net. The objective is to classify each text block in a pdf document page as either title, text, list, table and image.

Language: C# - Size: 1.13 MB - Last synced at: 5 days ago - Pushed at: almost 3 years ago - Stars: 7 - Forks: 1

Duke-Chronicle-Project/awesome-historical-newspaper-analysis

Awesome historical newspaper analysis tools and literature

Size: 6.84 KB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 7 - Forks: 0

lquirosd/Order_Relation_Operator

Learning to Sort Handwritten Text Lines in Reading Order through Estimated Binary Order Relations

Language: Python - Size: 44.9 KB - Last synced at: 2 months ago - Pushed at: almost 4 years ago - Stars: 5 - Forks: 2

qurator-spk/sbb_column_classifier

Get the number of columns for a document image

Language: Python - Size: 50.8 KB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 3 - Forks: 0

charlie6echo/VBDLDSCC

Vision Based Document Layout Detection, Segmentation and context classification using MaskRCNN on Tensorflow-Keras, PyTorch & Detectron2.

Language: Jupyter Notebook - Size: 15 MB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 1

shrikumaran/ABInBev-Hackathon

An end to end deep learning approach to extract information from shipping records

Language: Jupyter Notebook - Size: 11.9 MB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 0

EdwardNgo/Document-Layout-Detection

Project for Deep Learning and its application

Language: Jupyter Notebook - Size: 13.1 MB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 0

askintution/dhSegment Fork of dhlab-epfl/dhSegment

Generic framework for historical document processing

Size: 5.9 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 1

MansurPro/DocuParse

DocuParse is a high-performance tool for converting PDF documents into clean, structured Markdown files. Designed for speed and accuracy, it extracts and formats content while minimizing errors like hallucinations and repetitions.

Size: 3.91 KB - Last synced at: 2 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

qyhou/curated-document-layout-analysis

A curated list of resources on Document Layout Analysis

Size: 10.7 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

Ritesh1137/langchain-doc-intelligence-loader

Customized LangChain Azure Document Intelligence loader for table extraction and summarization

Language: Python - Size: 454 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Related Topics
pytorch 9 publaynet 9 ocr 7 deep-learning 6 pdf 6 python 5 mask-rcnn 5 machine-learning 5 computer-vision 5 table-detection 5 csharp 5 layout-analysis 5 pdfpig 4 document-layout 4 nlp 4 detectron2 4 object-detection 4 document-analysis 3 document-intelligence 3 segmentation 3 instance-segmentation 3 document-ai 3 table-extraction 2 artificial-intelligence 2 dotnet 2 document-understanding 2 faster-rcnn 2 document-image-analysis 2 document-image-processing 2 neural-networks 2 natural-language-processing 2 page-xml 2 handwritten-text-recognition 2 figure-detection 2 paragraph-detection 2 pretrained-models 2 deep-neural-networks 2 image-segmentation 2 docker 2 document-parser 2 pdf-document 2 page-segmentation 2 layout-detection 2 docstrum 2 pubtabnet 2 table-recognition 2 document-classification 2 generative-ai 2 accord-net 1 table-structure-recognition 1 dataset 1 support-vector-machine 1 python3 1 neural-network 1 page-object-detection 1 document-structure-extraction 1 document-structure-analysis 1 document-hierarchy-extraction 1 physical-layout-analysis 1 page-layout-analysis 1 optical-layout-recognition 1 optical-layout-analysis 1 document-image-understanding 1 xycut 1 xy-cut 1 textline-detection 1 qurator 1 tensorflow 1 layoutlm 1 spacy 1 rag 1 pdf-converter 1 docx 1 pdf-document-processor 1 ml-net 1 lightgbm 1 classifier 1 onnx 1 mlnet 1 mask-detection 1 sorting-algorithm 1 reading-order 1 layout-parser 1 text-extraction 1 tesseract-ocr 1 pdf-to-markdown 1 pdf-parsing 1 markdown-conversion 1 huggingface-transformers 1 google-colab 1 digital-archive 1 historical-newspapers 1 digital-humanities 1 svm-training 1 svm-classifier 1 svm 1 pix2pix 1 generative-adversarial-network 1 gan 1 text-detection 1