An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: publaynet

deepdoctection/deepdoctection

A Repo For Document AI

Language: Python - Size: 21.8 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 2,817 - Forks: 159

RapidAI/LabelConvert

🔄 A tool for object detection and image segmentation dataset format conversion.

Language: Python - Size: 26.5 MB - Last synced at: 6 days ago - Pushed at: 5 months ago - Stars: 304 - Forks: 67

BobLd/PdfPigMLNetBlockClassifier

Proof of concept of training a simple Region Classifier using PdfPig and ML.NET (LightGBM). The objective is to classify each text block in a pdf document page as either title, text, list, table and image.

Language: C# - Size: 1.1 MB - Last synced at: 1 day ago - Pushed at: about 5 years ago - Stars: 28 - Forks: 6

wix-incubator/DLT

Diffusion Layout Transformer implementation.

Language: Python - Size: 3.81 MB - Last synced at: 22 days ago - Pushed at: over 1 year ago - Stars: 58 - Forks: 4

JPLeoRX/detectron2-publaynet

Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset

Language: Python - Size: 7.76 MB - Last synced at: 3 days ago - Pushed at: about 2 years ago - Stars: 48 - Forks: 7

BobLd/PublayNet-maskrcnn-mlnet

Using a MaskRCNN model trained on the PublayNet dataset with ML.Net in C# / .Net for Document layout analysis and page segmmentation task.

Language: C# - Size: 166 MB - Last synced at: 1 day ago - Pushed at: about 2 years ago - Stars: 17 - Forks: 3

phamquiluan/PubLayNet

ICDAR 2019: MaskRCNN on PubLayNet datasets. Paragraph detection, table detection, figure detection,...

Language: Python - Size: 626 KB - Last synced at: 18 days ago - Pushed at: about 4 years ago - Stars: 179 - Forks: 39

marieai/marie-ai

Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pipelines (GenAI, LLM, VLLM) into your applications, supporting various tasks such as document cleanup, optical character recognition (OCR), classification, splitting, named entity recognition, and form processing

Language: Python - Size: 35.4 MB - Last synced at: 23 days ago - Pushed at: about 1 month ago - Stars: 67 - Forks: 7

BobLd/PublayNetSharp

Extract and convert PubLayNet data to PageXml format

Language: C# - Size: 38.1 KB - Last synced at: 1 day ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

creative-graphic-design/huggingface-datasets_PubLayNet

PubLayNet for huggingface datasets

Language: Python - Size: 113 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

hpanwar08/detectron2 Fork of facebookresearch/detectron2

Detectron2 for Document Layout Analysis

Language: Python - Size: 4.53 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 178 - Forks: 62

BobLd/PdfPigSvmRegionClassifier

Proof of concept of a simple SVM Region Classifier using PdfPig and Accord.Net. The objective is to classify each text block in a pdf document page as either title, text, list, table and image.

Language: C# - Size: 1.13 MB - Last synced at: 1 day ago - Pushed at: almost 3 years ago - Stars: 7 - Forks: 1

charlie6echo/VBDLDSCC

Vision Based Document Layout Detection, Segmentation and context classification using MaskRCNN on Tensorflow-Keras, PyTorch & Detectron2.

Language: Jupyter Notebook - Size: 15 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 1