Topic: "document-analysis"
opendatalab/MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
Language: Python - Size: 125 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 36,874 - Forks: 3,017

UglyToad/PdfPig
Read and extract text and other content from PDFs in C# (port of PDFBox)
Language: C# - Size: 180 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 2,067 - Forks: 263

AlibabaResearch/AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Language: C++ - Size: 104 MB - Last synced at: 21 days ago - Pushed at: 3 months ago - Stars: 1,727 - Forks: 194

tstanislawek/awesome-document-understanding
A curated list of resources for Document Understanding (DU) topic
Size: 5.56 MB - Last synced at: 16 days ago - Pushed at: about 2 years ago - Stars: 1,416 - Forks: 160

DocumindHQ/documind
Open-source platform for extracting structured data from documents using AI.
Language: JavaScript - Size: 1020 KB - Last synced at: 14 days ago - Pushed at: about 2 months ago - Stars: 1,326 - Forks: 48

Yuliang-Liu/Curve-Text-Detector
This repository provides train&test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking.
Language: Jupyter Notebook - Size: 27.9 MB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 646 - Forks: 156

NanoNets/docext
An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)
Language: Python - Size: 2.94 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 620 - Forks: 47

wenwenyu/PICK-pytorch
Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)
Language: Python - Size: 9.72 MB - Last synced at: 2 months ago - Pushed at: 11 months ago - Stars: 563 - Forks: 192

ispras/dedoc
Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser
Language: Python - Size: 235 MB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 493 - Forks: 37

CybercentreCanada/assemblyline
AssemblyLine 4: File triage and malware analysis
Language: Python - Size: 246 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 317 - Forks: 18

jpWang/LiLT
Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)
Language: Python - Size: 1.36 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 282 - Forks: 34

lazyFrogLOL/llmdocparser
A package for parsing PDFs and analyzing their content using LLMs.
Language: Python - Size: 1.21 MB - Last synced at: 28 days ago - Pushed at: 11 months ago - Stars: 271 - Forks: 9

pandora-analysis/pandora
Pandora is an analysis framework to discover if a file is suspicious and conveniently show the results
Language: Python - Size: 6.99 MB - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 263 - Forks: 42

masyagin1998/robin
RObust document image BINarization
Language: Python - Size: 24.8 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 180 - Forks: 38

chriswolfvision/local_adaptive_binarization
Local adaptive image binarization
Language: C++ - Size: 135 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 121 - Forks: 25

mirabdullahyaser/Retrieval-Augmented-Generation-Engine-with-LangChain-and-Streamlit
Powerful web application that combines Streamlit, LangChain, and Pinecone to simplify document analysis. Powered by OpenAI's GPT-3, RAG enables dynamic, interactive document conversations, making it ideal for efficient document retrieval and summarization.
Language: Python - Size: 11.3 MB - Last synced at: 3 months ago - Pushed at: 12 months ago - Stars: 119 - Forks: 59

aws-samples/amazon-textract-transformer-pipeline
Post-process Amazon Textract results with Hugging Face transformer models for document understanding
Language: Python - Size: 3.71 MB - Last synced at: 25 days ago - Pushed at: 7 months ago - Stars: 96 - Forks: 25

ppaanngggg/yolo-doclaynet
YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis
Language: Python - Size: 44.9 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 86 - Forks: 16

monniert/docExtractor
(ICFHR 2020 oral) Code for "docExtractor: An off-the-shelf historical document element extraction" paper
Language: Python - Size: 4.09 MB - Last synced at: 8 months ago - Pushed at: about 2 years ago - Stars: 85 - Forks: 10

anisha2102/docvqa
Document Visual Question Answering
Language: Python - Size: 146 KB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 85 - Forks: 20

Xyntopia/pydoxtools
Effortlessly extract information from unstructured data with this library, utilizing advanced AI techniques. Compose AI in customizable pipelines and diverse sources for your projects.
Language: Python - Size: 13.6 MB - Last synced at: 5 days ago - Pushed at: 10 months ago - Stars: 82 - Forks: 13

ZeningLin/ViBERTgrid-PyTorch
An unofficial PyTorch implementation of "Lin et al. ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents. ICDAR, 2021"
Language: Python - Size: 388 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 54 - Forks: 5

abdur75648/UTRNet-High-Resolution-Urdu-Text-Recognition
UTRNet: High-Resolution Urdu Text Recognition In Printed Documents (ICDAR'23)
Language: Python - Size: 126 KB - Last synced at: 2 months ago - Pushed at: 9 months ago - Stars: 51 - Forks: 10

JPLeoRX/detectron2-publaynet
Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset
Language: Python - Size: 7.76 MB - Last synced at: 21 days ago - Pushed at: about 2 years ago - Stars: 49 - Forks: 7

ankanbhunia/AdverseBiNet
Improving Document Binarization via Adversarial Noise-Texture Augmentation (ICIP 2019)
Language: Python - Size: 1.37 MB - Last synced at: over 2 years ago - Pushed at: about 6 years ago - Stars: 40 - Forks: 9

aws-solutions/enhanced-document-understanding-on-aws
Enhanced Document Understanding on AWS delivers an easy-to-use web application that ingests and analyzes documents, extracts content, identifies and redacts sensitive customer information, and creates search indexes from the analyzed data.
Language: JavaScript - Size: 62.5 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 38 - Forks: 16

microsoft/synthetic-rag-index
Service to import data from various sources and index it in AI Search. Increases data relevance and reduces final size by 90%+. Useful for RAG scenarios with LLM. Hosted in Azure with serverless architecture.
Language: Python - Size: 137 MB - Last synced at: 3 days ago - Pushed at: 9 months ago - Stars: 31 - Forks: 5

swapnil-ahlawat/Document_Layout_Analysis-MonkAI
DL models that take a document image file as input, locate the position of paragraphs, lines, images, etc. with their labels and confidence scores.
Language: Jupyter Notebook - Size: 50.6 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 26 - Forks: 6

muhd-umer/pyramidtabnet
Official PyTorch implementation of PyramidTabNet: Transformer-based Table Recognition in Image-based Documents
Language: Python - Size: 93 MB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 25 - Forks: 2

ihdia/docvisor
An open-source tool for visualisation of outputs of deep-learning models for document analysis tasks such as fully automatic, bounding box and OCR.
Language: Python - Size: 109 MB - Last synced at: 11 months ago - Pushed at: over 3 years ago - Stars: 19 - Forks: 4

huyhoang17/kuzushiji_recognition
[Late Submission] Solution for Kuzushiji recognition (Kaggle competition)
Language: Python - Size: 90 MB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 17 - Forks: 2

ad-freiburg/pdftotext-plus-plus
A fast and accurate command line tool for extracting text from PDF files.
Language: C++ - Size: 18.2 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 16 - Forks: 0

Retab-dev/retab
The developper starter pack for document processing
Language: Jupyter Notebook - Size: 17.9 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 15 - Forks: 1

AILab-UniFI/GNN-TableExtraction
Code for ICPR2022 paper: "Graph Neural Networks and Representation Embedding for table extraction in PDF Documents"
Language: Python - Size: 121 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 15 - Forks: 2

bookalope/InDesign-CEP
Adobe CEP extension for InDesign to use the Bookalope cloud services.
Language: JavaScript - Size: 4.12 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 14 - Forks: 6

bookalope/Bookalope
Everything related to Bookalope and its REST API.
Language: Python - Size: 163 KB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 12 - Forks: 4

AymurAI/backend
This repository contains the backend API and machine learning models of AymurAI
Language: Jupyter Notebook - Size: 40.4 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 10 - Forks: 0

therealexpertai/nlapi-java
Java Client for the expert.ai Natural Language API
Language: Java - Size: 163 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 9

TUWien/ReadModules
CVL/READ Modules including Basic Layout Analysis and Writer Identification/Retrieval
Language: C++ - Size: 3.53 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 9 - Forks: 4

soduco/paper-ner-bench-das22
All the material (paper, code, dataset, results) of our DAS 2022 paper (OCR+NER benchmark)
Language: Jupyter Notebook - Size: 313 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 0

CXH-Research/StainRestorer
[WACV 2025] High-Fidelity Document Stain Removal via A Large-Scale Real-World Dataset and A Memory-Augmented Transformer
Language: Python - Size: 20.5 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 7 - Forks: 2

aidayang/MinerU-OneClick
MinerU免安装部署一键启动整合包
Size: 49.8 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 7 - Forks: 0

gr8monk3ys/paper-summarizer
A Python-based tool for summarizing research papers and articles using NLP techniques. Simplify complex content efficiently
Language: HTML - Size: 19.5 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 7 - Forks: 1

ZeroBone/OfficialEye
An advanced AI-powered generic document-analysis tool
Language: Python - Size: 25.5 MB - Last synced at: 25 days ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 3

ethanhezhao/MetaLDA
The code for MetaLDA in ICDM 2017
Language: Java - Size: 2.86 MB - Last synced at: about 2 months ago - Pushed at: over 6 years ago - Stars: 7 - Forks: 4

TUWien/ReadFramework
The Core Framework for CVL/READ Modules
Language: C++ - Size: 26.4 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 6 - Forks: 7

nicolasfeyer/KWS-SIFT
Python code to perform keyword spotting using SIFT features
Language: Python - Size: 30.3 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

MBAigner/GraphConverter
A tool for creating a graph representation out of the content of PDFs or images.
Language: Python - Size: 486 KB - Last synced at: 10 days ago - Pushed at: almost 5 years ago - Stars: 5 - Forks: 0

omni-us/research-ContentDistillation-HTR
Source code for ICFHR20 "Distilling Content from Style for Handwritten Word Recognition"
Language: Python - Size: 330 KB - Last synced at: 2 months ago - Pushed at: about 5 years ago - Stars: 5 - Forks: 2

sohaib023/T-Truth
Labeling tool for Table Structures in Document Images.
Language: Java - Size: 4.68 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 4 - Forks: 3

ahmetkumass/contract-analyzer
Open-source tool for extracting and analyzing key information from legal contracts and documents with ease.
Language: Python - Size: 4.88 KB - Last synced at: 7 days ago - Pushed at: 9 months ago - Stars: 4 - Forks: 0

faizan1041/doc-understanding-gpt-langchain
Document understanding with GPT 3.5 integrated with Telegram
Language: Python - Size: 28.6 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 0

abdur75648/urdu-synth
High-quality synthetic text data generation for Urdu Text Recognition
Language: Python - Size: 291 MB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 1

Yosef-AlSabbah/Cloud-Based-Document-Analytics-Service-2
Cloud-based service for uploading, scraping, and managing PDF/DOCX documents. Features include title sorting, content search with highlights, rule-based classification, and storage stats. Integrated with cloud platforms for scalable document analytics.
Language: TypeScript - Size: 269 KB - Last synced at: 10 days ago - Pushed at: 25 days ago - Stars: 3 - Forks: 0

LATIS-DocumentAI-Team/DocumentAI-std
DocumentAI-std is a Python library designed to facilitate and standardize document analysis and processing tasks. It offers functionality for handling document elements, performing optical character recognition (OCR), and managing document datasets.
Language: Python - Size: 350 KB - Last synced at: 30 days ago - Pushed at: 4 months ago - Stars: 3 - Forks: 0

sohaib023/Truth-Py
Python module for handling XML files labelled using T-Truth tool.
Language: Python - Size: 19.5 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 3 - Forks: 2

qurator-spk/sbb_column_classifier
Get the number of columns for a document image
Language: Python - Size: 50.8 KB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 3 - Forks: 0

Schlafenhase/Document-Analyzer
CE-5505. Company document analysis w/ natural language processing for sensitive data detection. #Isaac
Language: C# - Size: 23 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 2

TTWJOE/dr-x-nlp-pipeline
A fully offline NLP pipeline for extracting, chunking, embedding, querying, summarizing, and translating research documents using local LLMs. Inspired by the fictional mystery of Dr. X, the system supports multi-format files, local RAG-based Q&A, Arabic translation, and ROUGE-based summarization — all without cloud dependencies.
Language: Python - Size: 9.92 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2 - Forks: 0

bx0-0/CyberVisionAI
Cyber Vision AI is an award-winning, open-source AI assistant for cybersecurity, document analysis, and knowledge management. Built with advanced RAG, MindMap, and multi-agent AI, it empowers security professionals and researchers with unrestricted, ethical, and insightful tools.
Language: Python - Size: 11.9 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 2 - Forks: 0

ismailbokri/Combot
This project, Contract Compliance, was developed as part of a project for the Contract Lifecycle Management module in the Management Information Systems program at ESPRIT University. It focuses on automating the monitoring and enforcement of contractual obligations to ensure regulatory compliance and minimize operational risks.
Language: Python - Size: 6.17 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

ksm26/dr-x-nlp-pipeline
A fully offline NLP pipeline for extracting, chunking, embedding, querying, summarizing, and translating research documents using local LLMs. Inspired by the fictional mystery of Dr. X, the system supports multi-format files, local RAG-based Q&A, Arabic translation, and ROUGE-based summarization — all without cloud dependencies.
Language: Python - Size: 9.92 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

rithulkamesh/docproc
Opinionated and Sophisticated Document Region Analyzer.
Language: Python - Size: 219 KB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

miku/grobidclient
A Go (golang) client for GROBID.
Language: Go - Size: 7.52 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

arsath-eng/RAG1-NVIDIA-GENAI
A powerful Retrieval Augmented Generation (RAG) application built with NVIDIA AI endpoints and Streamlit. This solution enables intelligent document analysis and question-answering using state-of-the-art language models, featuring multi-PDF processing, FAISS vector store integration, and advanced prompt engineering.
Language: Python - Size: 153 MB - Last synced at: 27 days ago - Pushed at: 8 months ago - Stars: 2 - Forks: 1

AlinaBaber/Document-Analysis-Identification-with-RAG-Vector-Database-and-Mistral-LLM
This Document Analysis pipeline is a comprehensive document analysis system, designed to automate the processing and analysis of documents from acquisition to consumption. It integrates advanced machine learning & AI models like RAG (Retrieval Augmented Generation) & Mistral LLM to efficiently extract, match, enrich, process document
Language: Python - Size: 14.8 MB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

JuanCarlosMartinezSevilla/MuRET-UserTool-deprecated
The objective of this repository is to provide MuRET's users a simple way to train deep learning models allowing an efficient transcription process.
Language: Python - Size: 733 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

moured/Document-Graphics-Digitization
official repo for the ICDAR 2023 paper "Line Graphics Digitization: A Step Towards Full Automation"
Size: 7.26 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

aquilu/muisca
Muisca: Modelo Unificado de Inteligencia Supervisada para la Computación y Aplicación. Una herramienta Streamlit para resumir y hacer preguntas sobre documentos en PDF y TXT utilizando modelos de lenguaje de última generación.
Language: Python - Size: 24.4 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

dev-luckymhz/AIVisionText-invoice-OCR-typescript
AIVisionText is an advanced document analysis platform that harnesses the power of artificial intelligence (AI) to revolutionize the way you manage and extract insights from documents.
Language: TypeScript - Size: 104 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 2

chulwoopack/Zone2OCR
Mapping a set of zones generated by a segmentation algorithm to the regions generated by OCR engine
Language: Python - Size: 38.4 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

MILE-IISc/DegradedWordsKannada
Benchmarking dataset of degraded word images (with character splits) in Kannada along with their associated ground truth Unicode text
Language: Shell - Size: 7.48 MB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 2 - Forks: 0

MILE-IISc/MergedSymbolsKannada
Benchmarking dataset of merged symbols in Kannada along with their associated ground truth Unicode text
Language: Shell - Size: 3.64 MB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 2 - Forks: 0

fredrikwahlberg/das2018
Code for the paper "Gaussian Process Classification as Metric Learning for Forensic Writer Identification", published at DAS 2018
Language: Python - Size: 20.5 KB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 2 - Forks: 0

baharsateli/Dissertation_Supplementary_Materials
Datasets, tools and results from my doctoral dissertation
Language: Shell - Size: 305 KB - Last synced at: over 2 years ago - Pushed at: over 7 years ago - Stars: 2 - Forks: 0

Rayyan9477/ocr-app
State-of-the-art Optical Character Recognition (OCR) with Vision Language Model (VLM) integration for enhanced accuracy and optimal document processing.
Language: TypeScript - Size: 23.9 MB - Last synced at: 10 days ago - Pushed at: 11 days ago - Stars: 1 - Forks: 0

SIYAKS-ARES/smart-doc-insight
AI destekli PDF sorgulama aracı. Ollama, LM Studio ve API'lar kullanarak PDF belgelerinizden doğal dille anında bilgi edinin.
Language: HTML - Size: 33 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 1 - Forks: 0

rk-vashista/pitch
A modern web application that analyzes pitch decks using multi-agent AI technology. Upload your pitch deck and get comprehensive feedback on structure, content, and potential improvements!
Language: Python - Size: 6.06 MB - Last synced at: 26 days ago - Pushed at: 27 days ago - Stars: 1 - Forks: 0

GautamBytes/IITM_HACKATHON
An AI-powered contract management tool using NLP and LLMs, achieving 95% accuracy in document analysis. The project significantly enhanced decision-making in contract management and showcased innovative use of AI technologies. Demo Video👇
Language: Python - Size: 2.59 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

mhdsedighi/DOC-Analyzer
Analyzing Many Documents with AI
Language: Python - Size: 324 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

acsenrafilho/cucaracha
A bureaucratic cockroach (cucaracha) assistent to help in document processing and analysis
Language: Python - Size: 6.44 MB - Last synced at: 7 days ago - Pushed at: 5 months ago - Stars: 1 - Forks: 1

BjornMelin/docmind-ai
DocMind AI is a powerful, open-source Streamlit application leveraging LangChain and local Large Language Models (LLMs) via Ollama for advanced document analysis. Analyze, summarize, and extract insights from a wide array of file formats—securely and privately, all offline.
Size: 32.2 KB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

DioCrafts/ai-book-summarizer
📚 AI-Powered Book PDF Knowledge Extractor & Summarizer Transform your PDF books into structured knowledge effortlessly! This tool leverages AI to analyze books page by page, extracting key insights, definitions, and concepts, and organizes them into Markdown summaries for easier study
Language: Python - Size: 29.6 MB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

yasuhiroinoue/Gemini_Discordbot_VertexAI
A Discord bot powered by Google Gemini Pro, capable of text generation, image analysis, audio transcription, and more.
Language: Python - Size: 127 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

Leg0shii/smart-documents
A web application that enables users to upload documents and utilize AI techniques like semantic search and text summarization for efficient analysis. Built with Python, FastAPI, Svelte, PostgreSQL, and LangChain.
Language: Python - Size: 405 KB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 1

pleb631/PdfDet
PdfDet aims to simplify PDF layout detect tasks for users.
Language: Python - Size: 14.7 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

x1ao4/doc-merger
通过 python 脚本将两个相对不完整的文档合并为一个完整的文档 / merge two relatively incomplete documents into one complete document via python script
Language: Python - Size: 22.5 KB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

teohsinyee/resume-parsing
Record process to build pipeline for resume parsing.
Language: Jupyter Notebook - Size: 1.18 MB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 1

billyotieno/haki-tech
This is a repository of legal tech startup activities and projects.
Size: 0 Bytes - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

ffairyttears/CyberVisionAI
Cyber Vision AI is an award-winning open-source AI assistant for cybersecurity. Explore its features on GitHub! 🌟💻
Language: Python - Size: 11.9 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

usrtem/ResearchAI
AI-powered document analysis tool for querying content across PDFs, Word files, Excel sheets, text files, and web URLs using Google’s Gemini API.
Language: Python - Size: 38.1 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

eleonc56/Cloud-Based-Document-Analytics-Service
Cloud-Based Document Analytics Service offers a simple way to manage your documents in the cloud. With features like drag-and-drop upload and powerful web scraping, it streamlines your document analysis. 🗂️💻
Language: TypeScript - Size: 315 KB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

jltk/briefgeist
Privacy-first desktop app for scanning, understanding and replying to letters.
Language: Python - Size: 34.2 KB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

AI-Data-Space/happymatrix-eco-assistant
AI-powered assistant for analyzing Engineering Change Orders (ECOs) using Google Gemini and RAG
Language: Jupyter Notebook - Size: 255 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Prathameshv07/AlyaAloft
A sophisticated PDF document analysis and question-answering application that leverages advanced AI models to provide detailed responses to user queries about PDF documents.
Language: Python - Size: 39.1 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

alexanderwedlund/isp-keyword-analyzer
A Streamlit tool for analyzing Information Security Policies by classifying keyword occurrences as "Actionable Advice" or "Other Information" to measure policy effectiveness through the "Keyword Loss of Specificity" metric.
Language: Python - Size: 1.12 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

swiss-ai-center/table-recognition-service
Table recognition service processes document-based input and utilizes a newly trained SLANet from PaddleOCR for robust table recognition.
Language: Python - Size: 16.7 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

veydantkatyal/doc-analysis
automatically extracts, summarizes, and analyzes PDF documents using Large Language Models (LLMs). It generates relevant questions and answers based on the document content for smarter understanding.
Language: Jupyter Notebook - Size: 190 KB - Last synced at: 26 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

DiTo97/neural-deskew
toolkit for learning efficient document image skew estimation (DISE)
Language: Python - Size: 48.8 KB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

george-gca/asreview-top2vec Fork of asreview/semantic-clusters
Semantic Clustering for ASReview Datasets using Top2Vec
Language: Python - Size: 18.7 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0
