Topic: "pdf-processing"
dissorial/doc-chatbot
Document chatbot — multiple files, topics, chat windows and chat history. Powered by GPT.
Language: TypeScript - Size: 2.54 MB - Last synced at: 7 days ago - Pushed at: almost 2 years ago - Stars: 852 - Forks: 146

allenai/papermage
library supporting NLP and CV research on scientific papers
Language: Python - Size: 48.8 MB - Last synced at: 25 days ago - Pushed at: 6 months ago - Stars: 757 - Forks: 61

ahmedkhemiri95/PDFs-TextExtract
Multiple and Large PDF Documents Text Extraction.
Language: Python - Size: 11.3 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 128 - Forks: 65

aws-samples/document-processing-pipeline-for-regulated-industries
A boilerplate solution for processing image and PDF documents for regulated industries, with lineage and pipeline operations metadata services.
Language: Python - Size: 11.4 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 50 - Forks: 12

ManasMadan/pdf-actions
A NPM Package built on top of pdf-lib that provides functonalities like merge, rotate, split,download pdf to disk and many more...
Language: JavaScript - Size: 41 KB - Last synced at: 6 months ago - Pushed at: over 1 year ago - Stars: 11 - Forks: 7

ManasMadan/PDFActions
Built with pdf-actions NPM package.
Language: JavaScript - Size: 2.09 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 8 - Forks: 4

Aleptonic/PdfSnipper
PdfSnipper is a lightweight and efficient Python package designed to simplify the management of PDF files, pages, and their conversions during various NLP, Computer Vision (CV), or other data processing tasks. The package eliminates the need for repetitive code by providing intuitive, ready-to-use functions for common PDF-related operations.
Language: Python - Size: 17.6 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

thinhuos0913/python_useful_mini_projects
This is some useful mini projects that I had worked for self-learning Python programming.
Language: Python - Size: 1.01 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 3 - Forks: 1

Inc44/MaTools
An all-in-one GUI management toolkit built with PyQt6, offering a suite of tools for file synchronization, media organization, PDF merging, code formatting, and more.
Language: Python - Size: 6.24 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

Yardenrsk/PsychometryReceiverCV
A side project to easily get and annotate questions and answers to the PsychometryBot project DB using computer vision and pdf parsing
Language: Python - Size: 11.1 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

HemalDholakiya12/PDFChat
A web app that allows users to upload PDFs and interact with them through a Q&A interface. The application extracts text from PDFs, generates embeddings, stores them in a FAISS database, and retrieves relevant information to provide context-aware answers using a large language model .
Language: JavaScript - Size: 119 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 2 - Forks: 0

rithulkamesh/docproc
Opinionated and Sophisticated Document Region Analyzer.
Language: Python - Size: 219 KB - Last synced at: 5 days ago - Pushed at: 23 days ago - Stars: 2 - Forks: 0

Farhaj499/RAG_with_Weaviate_DB
This project implements a Retrieval Augmented Generation (RAG) system that answers questions based on the PDF document. It utilizes Weaviate as a vector database for efficient retrieval of relevant information and Gemini to generate natural language responses.
Language: Jupyter Notebook - Size: 62.5 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

Al-shwaib/Book-Preparation-for-Printing
A web application for preparing books and magazines for offset printing. Automatically arranges PDF pages for commercial A3 printing, supporting both Arabic (RTL) and English (LTR) books. تطبيق ويب لتحضير الكتب والمجلات للطباعة على مطابع الأوفست. يقوم تلقائياً بترتيب صفحات PDF للطباعة التجارية على ورق A3، مع دعم الكتب العربية والإنجليزية.
Language: Python - Size: 40 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

arsath-eng/RAG1-NVIDIA-GENAI
A powerful Retrieval Augmented Generation (RAG) application built with NVIDIA AI endpoints and Streamlit. This solution enables intelligent document analysis and question-answering using state-of-the-art language models, featuring multi-PDF processing, FAISS vector store integration, and advanced prompt engineering.
Language: Python - Size: 153 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 2 - Forks: 1

dsckiet/covid-tracker-android-app
A statistical data display and notifier app for Covid-19 pandemic.
Language: Kotlin - Size: 974 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 2

Aumlo123/pdfdoom
DOOM in a PDF (as ascii art)
Size: 1000 Bytes - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

gs-ai/PDFProfessor
PDF Professor 2.0 extracts and processes PDF text, analyzed by Ollama for summarization, data extraction, and insights. More coming soon!
Language: Python - Size: 1.95 MB - Last synced at: 18 days ago - Pushed at: 19 days ago - Stars: 1 - Forks: 0

FurqanHun/textnomnom-py
Extract text from PDFs, PPTs, & URLs (with OCR support). Converts PPT to PDF & handles files or folders. 🦍
Language: Python - Size: 46.9 KB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 1 - Forks: 0

omritriki/BIU-Points-Calculator
A web application for calculating credit points and GPA from PDF transcripts. Built with FastAPI and pdfplumber, this tool simplifies the process for BIU engineering students.
Language: Python - Size: 198 KB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 1 - Forks: 0

9-5/Chromium-Intelligence
A powerful Chromium extension that leverages the multiple AI APIs to assist with various text operations, image analysis, and PDF processing.
Language: JavaScript - Size: 834 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 1 - Forks: 1

DioCrafts/ai-book-summarizer
📚 AI-Powered Book PDF Knowledge Extractor & Summarizer Transform your PDF books into structured knowledge effortlessly! This tool leverages AI to analyze books page by page, extracting key insights, definitions, and concepts, and organizes them into Markdown summaries for easier study
Language: Python - Size: 29.6 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

Remisu/GajyunETL
The goal of this project is to eliminate the need for paper by digitizing the process of handling client passport information.
Language: C# - Size: 1020 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

king04aman/PDF-Extractor-API
PDF Extractor API is a FastAPI project for extracting information from PDFs. It includes user authentication, PDF uploading, and text extraction. The API supports secure PDF uploads, keyword-based extraction, and rate limiting.
Language: Python - Size: 22.5 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 1 - Forks: 1

mohamedelareeg/ImageAutomaticCroppingWatcher
Image Automatic Cropping Watcher: A tool that automatically detects PDF files, converts them to images, corrects perspective distortion, and compiles them back into PDFs.
Language: C# - Size: 1.14 MB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

eddieyg/freedomfile
Freedom to use PDF, DOC and other document processing
Language: TypeScript - Size: 403 KB - Last synced at: 29 days ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

Francesco-Sovrano/Swiss-G2C-User-Guide-Analysis
Extensive analysis of user guides in Swiss government-to-citizen software, correlating guide features with canton socio-economic factors.
Language: Python - Size: 3.3 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

Mateusz2734/pdf-cli
CLI tool to merge, compress, extract or delete pages from PDF
Language: Python - Size: 21.5 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

akshatpunia26/berrylit_pdf_chat
Berrylit is a simple chatbot interface that allows users to upload a PDF file and ask a question related to its contents. The chatbot uses the Berri API for processing.
Language: Python - Size: 2.93 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

Dagmawi-22/qelem-web
Language: Svelte - Size: 234 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

gwyndolin75/Document-QA-System
A Streamlit-based app for asking questions directly from uploaded documents using Gemini embeddings and a language model. Supports PDF, TXT, and DOCX files. Fast, simple, and powerful document-based QA.
Language: Jupyter Notebook - Size: 1.22 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

anonymo2239/Secure-Document-Anonymization-System
An academic article review system that anonymizes submissions using NLP and computer vision to ensure fair and unbiased evaluations.
Language: Python - Size: 4.34 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

GMA-2025/Resume_Ranker_AI
An AI-powered tool that ranks resumes based on a job description using NLP and semantic similarity with Sentence Transformers.
Language: Python - Size: 65.4 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

denizdagli/QuantumComputingChatbot
Quantum Computing Chatbot is a Streamlit app that answers questions about quantum computing using a PDF document as its knowledge base. It uses Google Gemini and LangChain for intelligent, document-aware responses.
Language: Jupyter Notebook - Size: 19.8 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

Abishek7952/resume-classifier
An end-to-end machine learning web app that classifies PDF resumes into job-fit categories. Built with FastAPI, Streamlit & Docker. Deployed on Render.
Language: Python - Size: 165 MB - Last synced at: 17 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

itvincent-git/invoice_renamer
A tool to rename invoices based on their content
Language: Python - Size: 192 KB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 0 - Forks: 0

germabyte/pdf-ocr-remover
This program helps you remove the invisible text layer (also known as the OCR layer) from PDF files.
Language: Python - Size: 8.79 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Sigmakib2/express-pdf-watermark-api
A simple Express.js API for embedding watermarks on PDF files using pdf-lib and multer. This project demonstrates how to apply forensic watermarks with user details (or unique identifiers) to each page of a PDF, helping deter unauthorized distribution while maintaining user privacy.
Language: JavaScript - Size: 12.7 KB - Last synced at: 27 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

BaranDev/emurpg-backend
The 🐲EMU RPG API🐲 supports the EMU RPG Club’s events by managing game tables, players, and D&D character data. Built with FastAPI, it includes features like table/character management, real-time WebSocket updates, data validation, API monitoring, and secure access, providing an organized backend for tabletop RPG sessions.
Language: Python - Size: 1.83 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Harsha-Hemanth/ATS-Optimized-Resume-Analyzer
An ATS-optimized resume analyzer providing AI-powered insights, skill scoring, and compatibility analysis to enhance resumes for specific job descriptions
Language: Python - Size: 37.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

AdityaAdaki21/FASTapi-RAG
FASTapi-RAG is a FastAPI-based Retrieval-Augmented Generation system that lets users query PDF documents via an AI-powered chatbot. It integrates Ollama for language generation and ChromaDB for document indexing, offering features like document upload, natural language querying, and an interactive web interface.
Language: Python - Size: 32.2 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

nathania-rachael/Chat-with-Multiple-PDFs
An AI-powered chatbot that lets users upload multiple PDFs and ask questions based on their content. It extracts text, processes it with FAISS, and retrieves answers using Google Generative AI (Gemini Pro) through a simple Streamlit interface.
Language: Python - Size: 3.91 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

nazanin97/AI-ResumeRank
Resume Ranker is an AI-powered system that automatically analyzes and ranks resumes based on job-specific criteria. It fetches resumes from Google Drive, extracts text, scores candidates using Google Gemini API, and saves the results in a CSV file for easy review.
Language: Python - Size: 0 Bytes - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Mert-55/Remove-Brand-Logo
Automatically remove brand logos and unwanted slides from PDFs
Language: Python - Size: 10.7 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Faerque/PDF_scraper
PDF Scraper with Automation - A CLI tool for extracting text from PDFs and storing it in an SQLite database for structured querying. Supports digitally generated PDFs and enables efficient document processing.
Language: Python - Size: 544 KB - Last synced at: 24 days ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

jcaperella29/Ai_LLM_set_up
AI-powered research paper summarization using local LLMs (Ollama). Extracts, processes, and summarizes PDFs with structured insights. Ideal for scientific papers & bioinformatics
Language: Python - Size: 3.91 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

hemanth090/ATS-V2
AI-powered Resume Analysis System using Groq API and Flask. Evaluates resumes against job descriptions, provides ATS compatibility scores, and offers detailed improvement recommendations. Features Docker support and ngrok integration.
Language: CSS - Size: 31.3 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 1

Ahmed-AI-01/Multimodal-RAG
An AI-powered chat application using text, audio, and images for context-aware responses. It integrates language models and vector databases to enhance retrieval-augmented generation (RAG) capabilities, making it a versatile tool for intelligent conversations.
Language: Python - Size: 90.8 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

ADudhe01/PDFReaderChatbot
A chatbot app using Streamlit, LangChain, and OpenAI to interact with uploaded PDFs, extract text, and answer questions based on the document content.
Language: Python - Size: 4.88 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

HemantM29/Multimodal-Document-Analysis-and-Query-Retrieval
This project performs multimodal document analysis and query retrieval by downloading PDFs, converting pages to images, indexing them for semantic search, and analyzing retrieved images using visual-language models like Qwen2VL and Blip2.
Language: Jupyter Notebook - Size: 1.9 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

SanAfaGal/pdf-processor-for-eps-files
A tool designed to process and rename PDF files based on specific EPS configurations, utilizing exact and fuzzy matching techniques to identify file types efficiently.
Language: Python - Size: 46.9 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

hyuseinleshov/ocr-exporter
A Spring Boot-based OCR Exporter tool that extracts text from image or PDF files using the OCR Space API and exports the results to various formats such as PDF, text, Word, or a database.
Language: Java - Size: 42 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

BEKOhub/LLMGenerativeAIOpenAI
This project is a PDF-based Information Retrieval System powered by LangChain, OpenAI, and Streamlit. The application allows users to upload PDF files, process their contents, and interact with the extracted data using a conversational AI interface. It leverages FAISS for vector-based similarity searches and ChatGPT models (e.g., gpt-4-turbo)
Language: Python - Size: 10.7 KB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

SURAJ-K-GUPTA/PDF-WIZARD
AI-powered PDF Assistant: Upload PDFs and ask questions about the content with intelligent answers powered by FastAPI and LangChain. Option to check Better Answer for enhanced responses.
Language: JavaScript - Size: 457 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Guiss-Guiss/ScriptumAI
RAG Application ScriptumAI is an advanced Retrieval-Augmented Generation platform designed for document ingestion, semantic search, and query processing.
Language: Python - Size: 11.3 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

ashainp/Combine-PDF
A Python script to combine multiple PDFs, allowing the insertion of one PDF before the last page of another. Flexible for adding additional documents. Perfect for document management tasks.
Language: Python - Size: 17.6 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

ranguy9304/LangGraphRAG
LangGraphRAG: A terminal-based Retrieval-Augmented Generation system using LangGraph. Features include message history caching, query transformation, and vector database retrieval. Ideal for NLP researchers and developers working on advanced conversational AI and information retrieval systems.
Language: Python - Size: 51.7 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

ts-azure-services/batch-doc-pipeline
Language: Python - Size: 52.7 KB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

RG-7/PDF_Merger
Merge multiple PDF files into a single PDF with ease using this simple Python PDF Merger. 🚀
Language: Python - Size: 1.49 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

clydeknox/PDFGuard-Secure-and-Sanitize-PDFs-with-Python
PDFGuard is a user-friendly Python application that helps you enhance the security of PDF files by removing potential security threats and hidden content. It does this by converting PDF pages into images and then creating new, sanitized PDFs from these images.
Language: Python - Size: 22.5 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Govind-S-B/pdf-to-text-chroma-search
Python scripts that converts PDF files to text, splits them into chunks, and stores their vector representations using GPT4All embeddings in a Chroma DB. It also provides a script to query the Chroma DB for similarity search based on user input.
Language: Python - Size: 0 Bytes - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

baafbass/watermaker
PDF processing by writing scripts using python
Language: Python - Size: 79.1 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

UntaintedTech/pdf-processing
PDF merger and stamper (watermark) using python and PyPDF2 - an open source pure-python PDF library
Language: Python - Size: 1.95 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Razwand/dealing_with_docs
Playing with pdf doc processing 🧾
Language: Python - Size: 3.18 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0
