An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: docling

iamchandanys/beacon-index-ai

This project provides a modular framework for conversational AI using RAG, integrating Azure Content Safety for moderation, prompt shielding, and groundedness detection. It includes controllers, models, services, and utilities for handling documents, extracting information, and ensuring safe, reliable AI interactions.

Language: Jupyter Notebook - Size: 1.81 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 4 - Forks: 32

garyzava/chat-to-database-chatbot

Chat to your Database GenAI Chatbot

Language: Jupyter Notebook - Size: 24.5 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 36 - Forks: 7

codemurt/rag_system

RAG-система для Q&A по технической документации на русском и английском, построенная на LangChain и ChromaDB. Обрабатывает текстовые и сканированные PDF-документы с помощью Docling VLM.

Language: Jupyter Notebook - Size: 135 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

TM9657/docling-binary

Docling Binary Server.

Language: Python - Size: 839 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 3 - Forks: 1

quarkiverse/quarkus-docling

Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem

Language: Java - Size: 135 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 10 - Forks: 5

kavyakapoor420/Haqdarshak-Stackoverflow-project

This project aims to build an AI-powered knowledge sharing system that enables agents to ask and answer questions, contribute verified learnings, and organically grow a community driven knowledge base much like a “StackOverflow for Scheme Agents.” The system will also promote community engagement, ultimately improving retention and performace.

Language: TypeScript - Size: 26.9 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

docling-project/docling4j

Docling4j brings the functionalities of Docling in document understanding to Java® projects

Language: Java - Size: 32.2 KB - Last synced at: 2 days ago - Pushed at: 5 months ago - Stars: 16 - Forks: 0

aspose-cells-python/aspose-cells-python

High-performance Python Excel processing library with advanced conversion capabilities

Language: Python - Size: 978 KB - Last synced at: 10 days ago - Pushed at: 11 days ago - Stars: 1 - Forks: 0

versionHQ/multi-agent-system

Autonomous agent networks for task automation that requires multi-step reasoning

Language: Python - Size: 3.24 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 24 - Forks: 5

shijincai/fast360

The industry's first "Open Source OCR Arena," a free, no-login utility for one-click benchmarking of 7 top-tier models (Marker, MinerU, MonkeyOCR, Docling, Dolphin, OCRFlux, PP-StructureV3) on your PDF/image files, specializing in PDF-to-Markdown conversion.

Size: 3.78 MB - Last synced at: 8 days ago - Pushed at: 21 days ago - Stars: 1 - Forks: 0

btwld/docling-sdk

A TypeScript SDK for Docling - Bridge between the Python Docling ecosystem and JavaScript/TypeScript.

Language: TypeScript - Size: 2.48 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 2 - Forks: 0

JuaniLlaberia/document-ingestion

The document ingestion pipeline, responsible for processing documents, extracting structured content and images, generating embeddings, and storing everything in ChromaDB.

Language: Python - Size: 18.6 KB - Last synced at: 7 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

felixdittrich92/docling-OCR-OnnxTR

OnnxTR OCR plugin for Docling

Language: Python - Size: 1.49 MB - Last synced at: 1 day ago - Pushed at: 18 days ago - Stars: 8 - Forks: 0

ibm-granite-community/docling-workshop

Source code for Docling Workshop

Size: 71.3 KB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 1 - Forks: 5

ghodsizadeh/pdf2csv

A python library and CLI tool to convert PDF files to CSV files.

Language: Python - Size: 475 KB - Last synced at: 14 days ago - Pushed at: 8 months ago - Stars: 30 - Forks: 2

Drntth/rag-ai-assistant

RAG AI Assistant is a modular system for advanced document-based Q&A. It uses a vector database (PostgreSQL + pgvector) for fast, context-aware search and supports multiple chat/embedding models. A document pipeline cleans and converts DOCX/TXT files for embedding, but the main focus is on AI-powered question answering.

Language: Python - Size: 37.1 KB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 0 - Forks: 0

sa9arr/rag-financial-docs

Exploring and comparing RAG techniques for financial documents- including naive RAG, knowledge graph-powered RAG, and long-context (no chunking) RAG

Language: Python - Size: 3.84 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 0 - Forks: 0

Aparnap2/selfhost-knowledgebase-query

Enterprise-grade self-hosted AI knowledge base with multi-agent coordination, advanced document processing, and comprehensive security features.

Language: Python - Size: 243 KB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

bisonbet/open-health Fork of OpenHealthForAll/open-health

OpenHealth, AI Health Assistant | Powered by Your Data

Language: TypeScript - Size: 4.97 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 4 - Forks: 0

genieincodebottle/parsemypdf

Collection of PDF parsing libraries like AI based docling, claude, openai, gemini, meta's llama-vision, unstructured-io, and pdfminer, pymupdf, pdfplumber etc for efficient snapshot, text, table, and metadata extraction.

Language: Python - Size: 3.01 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 111 - Forks: 27

k0msenapati/waterwise

💧 Smart Water Chatbot

Language: Python - Size: 21.1 MB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

ya0002/obsidian-assist

Make Zettelkasten-style note-taking the foundation of interactions with Large Language Models (LLMs).

Language: Python - Size: 4.59 MB - Last synced at: 18 days ago - Pushed at: 4 months ago - Stars: 10 - Forks: 0

ihaterynn/Docling-Processor

Document Processing Script using Docling

Language: Python - Size: 4.03 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

FrancescoCrecchi/LlamaIndex-RAG-Tutorials

Tutorials on using LlamaIndex for modern RAG applications

Language: Jupyter Notebook - Size: 1.21 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

sarmishra/Python-RAG-Pipeline

Building a Retrieval Augmented Generation (RAG) system that demonstrates how to index documents, retrieve relevant content, generate AI-powered responses, and evaluate results—all through a command line interface (CLI).

Language: Python - Size: 441 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

amirkiarafiei/docling-processor

A Docling extension for superior PDF/DOCX to Markdown conversion, featuring smart image understanding with Gemini VLM.

Language: Python - Size: 847 KB - Last synced at: 10 days ago - Pushed at: 3 months ago - Stars: 2 - Forks: 1

shoryasethia/markdrop

A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more functionalities. Markdrop is available on PyPI.

Language: Python - Size: 158 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 123 - Forks: 5

ParthaPRay/docling_RAG_langchain_colab

This repo contains codes for RAG using docling on colab notebook with langchain, milvus, huggingface embedding model and LLM

Language: Jupyter Notebook - Size: 20.5 KB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

onecio/docling_py

Docling Python NGX

Language: HTML - Size: 2.93 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

MariosAdamidis/FORTHought

An open-source, interpretable platform for AI-accelerated scientific discovery, featuring a RAG pipeline, a self-correcting code interpreter, and scientific tool integration.

Language: Dockerfile - Size: 707 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

sentryeagle/docling

An Open-Source CLI Tool for Document Processing, Diverse Format Parsing and Advanced PDF Understanding.

Language: Shell - Size: 10.7 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

hemanthkt/impactoverse-AI-mentor

Developed an intelligent AI chatbot utilizing the DeepSeek LLM, designed for efficient interaction with large documents such as textbooks and study materials. Integrated Docling for parsing and processing large files, and implemented a Retrieval-Augmented Generation (RAG) pipeline using FAISS and Sentence Transformers to optimize context retrieval

Language: JavaScript - Size: 595 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

Rishang/deep-research

Python SDK for Deep-Research

Language: Python - Size: 196 KB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

Danitilahun/Document-processing-Pdf-Structured-Data-Extractor

This project demonstrates how to extract structured information from PDF documents using a combination of Langchain, OpenAI models, and the DocLing library. It provides a framework for parsing PDFs and leveraging LLMs to identify and format key data points.

Language: Jupyter Notebook - Size: 64.5 KB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 1

Slayer412/docling-bedrock-plugin

Integrates AWS Bedrock's multimodal capabilities (Claude 3) into the Docling framework for generating image descriptions within document processing pipelines.

Language: Python - Size: 22.5 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 3 - Forks: 0

ParthaPRay/Docling_Colab

This repo contains google colab notebook for handing Docling for data extraction such as text, image, table etc.

Language: Jupyter Notebook - Size: 697 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

katagaki/Lingus

PDF and Markdown conversion using Docling and LibreOffice

Language: Python - Size: 83 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

fahdmirza/doclingwithollama

Docling with Ollama - RAG on Local Files with Local Models

Language: Python - Size: 880 KB - Last synced at: 5 months ago - Pushed at: 8 months ago - Stars: 57 - Forks: 15

Jarus77/markdrop

A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more functionalities. Markdrop is available on PyPI.

Language: Python - Size: 85.9 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

ramona1999/Contract-Risk-Assessment

This project is an AI-powered Contract Risk Assessment and Legal Assistant designed to analyze legal documents, extract key clauses, assess risks, and provide actionable recommendations. Additionally, a fine-tuned conversational chatbot is integrated for interactive legal Q&A based on contract-specific knowledge.

Language: Jupyter Notebook - Size: 840 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

thevladdo/rag-backend

Retrieval-Augmented Generation server with Pinecone and OpenAI

Language: HTML - Size: 46.9 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

maciekmalachowski/CVWizard

🧙‍♂️AI-powered tool to optimize your CV with job-specific keywords and align it to your dream job.

Language: TypeScript - Size: 3.65 MB - Last synced at: 11 days ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

jmxt3/pdf_to_txt_converter

A Python script that converts PDF files to text using the docling library. This tool is designed to batch process PDF files, making it easy to extract text content from multiple documents at once.

Language: Python - Size: 1.77 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

hyoaru/rag4jiya-process

Agentic RAG-based system with nursing handbooks and transes as knowledge base for my bebiloves

Language: Jupyter Notebook - Size: 13.8 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

kwame-mintah/python-langchain-chainlit-qdrant-ollama-stack-template

📄 A template for project for creating a chainlit application, using a locally run model via ollama and qdrant vector database for document retrieval.

Language: Python - Size: 46.9 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

patw/docinator

A small service to convert PDF files to Markdown using the Docling library

Language: Python - Size: 4.88 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

qlfv/Docling-Testing

Repository for testing and demonstrating the capabilities of Docling for document conversion.

Language: HTML - Size: 18.4 MB - Last synced at: 5 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 2

ParthaPRay/gradio_docling_rag_langchain

This repo provide RAG using Docling, langchain, milvus, sentence transformers, huggingface LLMs

Language: Jupyter Notebook - Size: 20.5 KB - Last synced at: 6 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

shrimantasatpati/Document_Parser_using_AI

Parse documents using AI - any document converted to markdown suitable for RAG applications

Language: Jupyter Notebook - Size: 12.2 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0