GitHub topics: docling
iamchandanys/beacon-index-ai
This project provides a modular framework for conversational AI using RAG, integrating Azure Content Safety for moderation, prompt shielding, and groundedness detection. It includes controllers, models, services, and utilities for handling documents, extracting information, and ensuring safe, reliable AI interactions.
Language: Jupyter Notebook - Size: 1.81 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 4 - Forks: 32

garyzava/chat-to-database-chatbot
Chat to your Database GenAI Chatbot
Language: Jupyter Notebook - Size: 24.5 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 36 - Forks: 7

codemurt/rag_system
RAG-система для Q&A по технической документации на русском и английском, построенная на LangChain и ChromaDB. Обрабатывает текстовые и сканированные PDF-документы с помощью Docling VLM.
Language: Jupyter Notebook - Size: 135 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

TM9657/docling-binary
Docling Binary Server.
Language: Python - Size: 839 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 3 - Forks: 1

quarkiverse/quarkus-docling
Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem
Language: Java - Size: 135 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 10 - Forks: 5

kavyakapoor420/Haqdarshak-Stackoverflow-project
This project aims to build an AI-powered knowledge sharing system that enables agents to ask and answer questions, contribute verified learnings, and organically grow a community driven knowledge base much like a “StackOverflow for Scheme Agents.” The system will also promote community engagement, ultimately improving retention and performace.
Language: TypeScript - Size: 26.9 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

docling-project/docling4j
Docling4j brings the functionalities of Docling in document understanding to Java® projects
Language: Java - Size: 32.2 KB - Last synced at: 2 days ago - Pushed at: 5 months ago - Stars: 16 - Forks: 0

aspose-cells-python/aspose-cells-python
High-performance Python Excel processing library with advanced conversion capabilities
Language: Python - Size: 978 KB - Last synced at: 10 days ago - Pushed at: 11 days ago - Stars: 1 - Forks: 0

versionHQ/multi-agent-system
Autonomous agent networks for task automation that requires multi-step reasoning
Language: Python - Size: 3.24 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 24 - Forks: 5

shijincai/fast360
The industry's first "Open Source OCR Arena," a free, no-login utility for one-click benchmarking of 7 top-tier models (Marker, MinerU, MonkeyOCR, Docling, Dolphin, OCRFlux, PP-StructureV3) on your PDF/image files, specializing in PDF-to-Markdown conversion.
Size: 3.78 MB - Last synced at: 8 days ago - Pushed at: 21 days ago - Stars: 1 - Forks: 0

btwld/docling-sdk
A TypeScript SDK for Docling - Bridge between the Python Docling ecosystem and JavaScript/TypeScript.
Language: TypeScript - Size: 2.48 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 2 - Forks: 0

JuaniLlaberia/document-ingestion
The document ingestion pipeline, responsible for processing documents, extracting structured content and images, generating embeddings, and storing everything in ChromaDB.
Language: Python - Size: 18.6 KB - Last synced at: 7 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

felixdittrich92/docling-OCR-OnnxTR
OnnxTR OCR plugin for Docling
Language: Python - Size: 1.49 MB - Last synced at: 1 day ago - Pushed at: 18 days ago - Stars: 8 - Forks: 0

ibm-granite-community/docling-workshop
Source code for Docling Workshop
Size: 71.3 KB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 1 - Forks: 5

ghodsizadeh/pdf2csv
A python library and CLI tool to convert PDF files to CSV files.
Language: Python - Size: 475 KB - Last synced at: 14 days ago - Pushed at: 8 months ago - Stars: 30 - Forks: 2

Drntth/rag-ai-assistant
RAG AI Assistant is a modular system for advanced document-based Q&A. It uses a vector database (PostgreSQL + pgvector) for fast, context-aware search and supports multiple chat/embedding models. A document pipeline cleans and converts DOCX/TXT files for embedding, but the main focus is on AI-powered question answering.
Language: Python - Size: 37.1 KB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 0 - Forks: 0

sa9arr/rag-financial-docs
Exploring and comparing RAG techniques for financial documents- including naive RAG, knowledge graph-powered RAG, and long-context (no chunking) RAG
Language: Python - Size: 3.84 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 0 - Forks: 0

Aparnap2/selfhost-knowledgebase-query
Enterprise-grade self-hosted AI knowledge base with multi-agent coordination, advanced document processing, and comprehensive security features.
Language: Python - Size: 243 KB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

bisonbet/open-health Fork of OpenHealthForAll/open-health
OpenHealth, AI Health Assistant | Powered by Your Data
Language: TypeScript - Size: 4.97 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 4 - Forks: 0

genieincodebottle/parsemypdf
Collection of PDF parsing libraries like AI based docling, claude, openai, gemini, meta's llama-vision, unstructured-io, and pdfminer, pymupdf, pdfplumber etc for efficient snapshot, text, table, and metadata extraction.
Language: Python - Size: 3.01 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 111 - Forks: 27

k0msenapati/waterwise
💧 Smart Water Chatbot
Language: Python - Size: 21.1 MB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

ya0002/obsidian-assist
Make Zettelkasten-style note-taking the foundation of interactions with Large Language Models (LLMs).
Language: Python - Size: 4.59 MB - Last synced at: 18 days ago - Pushed at: 4 months ago - Stars: 10 - Forks: 0

ihaterynn/Docling-Processor
Document Processing Script using Docling
Language: Python - Size: 4.03 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

FrancescoCrecchi/LlamaIndex-RAG-Tutorials
Tutorials on using LlamaIndex for modern RAG applications
Language: Jupyter Notebook - Size: 1.21 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

sarmishra/Python-RAG-Pipeline
Building a Retrieval Augmented Generation (RAG) system that demonstrates how to index documents, retrieve relevant content, generate AI-powered responses, and evaluate results—all through a command line interface (CLI).
Language: Python - Size: 441 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

amirkiarafiei/docling-processor
A Docling extension for superior PDF/DOCX to Markdown conversion, featuring smart image understanding with Gemini VLM.
Language: Python - Size: 847 KB - Last synced at: 10 days ago - Pushed at: 3 months ago - Stars: 2 - Forks: 1

shoryasethia/markdrop
A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more functionalities. Markdrop is available on PyPI.
Language: Python - Size: 158 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 123 - Forks: 5

ParthaPRay/docling_RAG_langchain_colab
This repo contains codes for RAG using docling on colab notebook with langchain, milvus, huggingface embedding model and LLM
Language: Jupyter Notebook - Size: 20.5 KB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

onecio/docling_py
Docling Python NGX
Language: HTML - Size: 2.93 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

MariosAdamidis/FORTHought
An open-source, interpretable platform for AI-accelerated scientific discovery, featuring a RAG pipeline, a self-correcting code interpreter, and scientific tool integration.
Language: Dockerfile - Size: 707 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

sentryeagle/docling
An Open-Source CLI Tool for Document Processing, Diverse Format Parsing and Advanced PDF Understanding.
Language: Shell - Size: 10.7 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

hemanthkt/impactoverse-AI-mentor
Developed an intelligent AI chatbot utilizing the DeepSeek LLM, designed for efficient interaction with large documents such as textbooks and study materials. Integrated Docling for parsing and processing large files, and implemented a Retrieval-Augmented Generation (RAG) pipeline using FAISS and Sentence Transformers to optimize context retrieval
Language: JavaScript - Size: 595 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

Rishang/deep-research
Python SDK for Deep-Research
Language: Python - Size: 196 KB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

Danitilahun/Document-processing-Pdf-Structured-Data-Extractor
This project demonstrates how to extract structured information from PDF documents using a combination of Langchain, OpenAI models, and the DocLing library. It provides a framework for parsing PDFs and leveraging LLMs to identify and format key data points.
Language: Jupyter Notebook - Size: 64.5 KB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 1

Slayer412/docling-bedrock-plugin
Integrates AWS Bedrock's multimodal capabilities (Claude 3) into the Docling framework for generating image descriptions within document processing pipelines.
Language: Python - Size: 22.5 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 3 - Forks: 0

ParthaPRay/Docling_Colab
This repo contains google colab notebook for handing Docling for data extraction such as text, image, table etc.
Language: Jupyter Notebook - Size: 697 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

katagaki/Lingus
PDF and Markdown conversion using Docling and LibreOffice
Language: Python - Size: 83 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

fahdmirza/doclingwithollama
Docling with Ollama - RAG on Local Files with Local Models
Language: Python - Size: 880 KB - Last synced at: 5 months ago - Pushed at: 8 months ago - Stars: 57 - Forks: 15

Jarus77/markdrop
A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more functionalities. Markdrop is available on PyPI.
Language: Python - Size: 85.9 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

ramona1999/Contract-Risk-Assessment
This project is an AI-powered Contract Risk Assessment and Legal Assistant designed to analyze legal documents, extract key clauses, assess risks, and provide actionable recommendations. Additionally, a fine-tuned conversational chatbot is integrated for interactive legal Q&A based on contract-specific knowledge.
Language: Jupyter Notebook - Size: 840 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

thevladdo/rag-backend
Retrieval-Augmented Generation server with Pinecone and OpenAI
Language: HTML - Size: 46.9 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

maciekmalachowski/CVWizard
🧙♂️AI-powered tool to optimize your CV with job-specific keywords and align it to your dream job.
Language: TypeScript - Size: 3.65 MB - Last synced at: 11 days ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

jmxt3/pdf_to_txt_converter
A Python script that converts PDF files to text using the docling library. This tool is designed to batch process PDF files, making it easy to extract text content from multiple documents at once.
Language: Python - Size: 1.77 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

hyoaru/rag4jiya-process
Agentic RAG-based system with nursing handbooks and transes as knowledge base for my bebiloves
Language: Jupyter Notebook - Size: 13.8 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

kwame-mintah/python-langchain-chainlit-qdrant-ollama-stack-template
📄 A template for project for creating a chainlit application, using a locally run model via ollama and qdrant vector database for document retrieval.
Language: Python - Size: 46.9 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

patw/docinator
A small service to convert PDF files to Markdown using the Docling library
Language: Python - Size: 4.88 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

qlfv/Docling-Testing
Repository for testing and demonstrating the capabilities of Docling for document conversion.
Language: HTML - Size: 18.4 MB - Last synced at: 5 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 2

ParthaPRay/gradio_docling_rag_langchain
This repo provide RAG using Docling, langchain, milvus, sentence transformers, huggingface LLMs
Language: Jupyter Notebook - Size: 20.5 KB - Last synced at: 6 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

shrimantasatpati/Document_Parser_using_AI
Parse documents using AI - any document converted to markdown suitable for RAG applications
Language: Jupyter Notebook - Size: 12.2 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0
