GitHub topics: pymupdf
pymupdf/PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Language: Python - Size: 329 MB - Last synced at: about 20 hours ago - Pushed at: about 21 hours ago - Stars: 7,127 - Forks: 601

Krasjet/pdf.tocgen
A CLI toolset to generate table of contents for PDF files automatically.
Language: Python - Size: 430 KB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 737 - Forks: 25

jdonohue44/NOAA-Weather-Modification-Forms-LLM-Extractor
Extract key information from 1,000s of NOAA Form 17-4 (Initial Report On Weather Modification Activities) using LLM.
Language: Python - Size: 982 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2 - Forks: 0

CBIhalsen/PolyglotPDF
(eBook,PDFs Translation) A multilingual eBook processing tool supporting all eBook formats. Features online and offline translation while preserving original layouts. Compatible with both scanned and digital PDFs. Elegant user interface. The world's highest-performing open-source layout-preserving eBook translator.
Language: Python - Size: 104 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1,989 - Forks: 275

HemalDholakiya12/PDFChat
A web app that allows users to upload PDFs and interact with them through a Q&A interface. The application extracts text from PDFs, generates embeddings, stores them in a FAISS database, and retrieves relevant information to provide context-aware answers using a large language model .
Language: JavaScript - Size: 119 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 2 - Forks: 0

dipanshudhage/Crop-and-Fertiliser-Recommendation-System
The Crop and Fertilizer Recommendation System leverages machine learning to assist farmers in selecting the best crops and fertilizers based on soil nutrient data. By analyzing soil test reports (images/PDFs), the system provides AI-driven recommendations for optimal crop growth and fertilizer use, tailored to the farmer’s specific soil conditions.
Language: Python - Size: 4.12 MB - Last synced at: 6 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

lefkovitzj/PyPdfApp
A PDF manipulation and access application developed in Python predominantly built using the PyMuPDF and CustomTkinter modules.
Language: Python - Size: 1.05 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

JoseLVillaronga/teccam_pdf
Teccam PDF es una aplicación web en Python/Flask que extrae texto de documentos PDF y páginas web, lo convierte automáticamente a Markdown y lo almacena en MongoDB. Ofrece interfaz responsive con modo claro/oscuro, gestión de permisos (público/privado), marcadores de posición de lectura y despliegue como servicio systemd.
Language: HTML - Size: 41 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

vikas-kashyap97/Resume-Screening
AI-Powered Research Summarizer is a web app that uses Google’s Gemini 1.5 Pro to generate tailored, clear summaries of research papers. It supports PDF uploads, multiple summary styles, and exports to DOCX or PDF.
Language: Python - Size: 109 KB - Last synced at: 6 days ago - Pushed at: 13 days ago - Stars: 1 - Forks: 0

AhmedTrb/PDF_highlight_extractor
A python application built with PySide6 and PyMuPDF that extracts highlighted text from PDF files and categorizes then based on the color, allowing users to save and organize highlighted content in a markdown file.
Language: Python - Size: 35.2 KB - Last synced at: about 21 hours ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

pawankumar94/graphscribe-table-extractor
Graphscribe is an intelligent, LLM-powered document understanding system designed to extract structured insights from complex visual content such as statistical diagrams, charts, and graphs.
Language: Python - Size: 19.6 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

code-418-dpr/SportHub-parser
Парсер PDF-файла ЕКП Минспорта РФ для проекта SportHub
Language: Python - Size: 2.26 MB - Last synced at: 23 days ago - Pushed at: 24 days ago - Stars: 0 - Forks: 0

genieincodebottle/parsemypdf
Collection of PDF parsing libraries like AI based docling, claude, openai, llama-vision, unstructured-io, and pdfminer, pymupdf, pdfplumber etc for efficient snapshot, text, table, and metadata extraction.
Language: Python - Size: 2.75 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 61 - Forks: 20

NaS-Research/knowledge-model
Our knowledge system systematically ingests, processes, and indexes open-access life science publications. It supports internal research by providing precise question-answering and efficient retrieval from a continuously updated repository of scientific literature
Language: Python - Size: 95.4 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 0 - Forks: 0

GokulGowthamS/AskDocs_GEN-AI
AskDocs Generative AI
Language: Python - Size: 2.1 MB - Last synced at: 25 days ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

vb64/markdown-pdf
Markdown to pdf renderer
Language: Python - Size: 539 KB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 78 - Forks: 6

alexandertiopan1212/SmartScan-AI
SmartScan-AI is a Streamlit app for invoice & PO extraction, matching, and AI-powered document Q&A.
Language: Python - Size: 0 Bytes - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

ArtifexSoftware/pdf2docx
Open source Python library for converting PDF to DOCX.
Language: Python - Size: 21.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2,867 - Forks: 406

esnanta/docu-query
Proyek ini merupakan prototipe awal chatbot berbasis AI yang dirancang untuk menyajikan informasi terkait regulasi.
Language: Jupyter Notebook - Size: 1.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

cloudy-sfu/TOC-to-bookmarks
Automatically create bookmarks from "table of content" for *.pdf books
Language: Python - Size: 609 KB - Last synced at: 2 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

germabyte/pdf-ocr-remover
This program helps you remove the invisible text layer (also known as the OCR layer) from PDF files.
Language: Python - Size: 8.79 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

lucasrla/remarks
Extract annotations (highlights and scribbles) from PDF, EPUB, and notebooks marked with reMarkable tablets. Export to Markdown, PDF, PNG, SVG
Language: Python - Size: 3.8 MB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 369 - Forks: 25

errejotaeme/diagrama
Herramienta para generar diagramas
Language: Python - Size: 6.36 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

Ananthakrishnan12/Resume-Analyzer-Using-BERT
Resume Analyzer Using BERT
Language: Python - Size: 8.4 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

Zain-Bin-Arshad/pdf-viewer
A Pure Python PDFViewer, which provides functionalities same as other famous PDFViewers.
Language: Python - Size: 338 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 82 - Forks: 21

shushilgirish/BigData_DataProcess_andMarkDownViewer Fork of khavnekar-y/AI-Information-Extractor
Automated Document Processing and Markdown Generation System
Language: Python - Size: 2.44 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

shefreenkaur/NLP_Query_Documents
This repository contains two implementations of an NLP document query system that processes PDF documents and ranks them based on relevance to user queries.
Language: Python - Size: 0 Bytes - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

jfriedlein/h2aFreeplane_pdf-highlightedText_to_Freeplane_synch
Freeplane script to organise highlighted text and notes from pdf files as Freeplane mindmap
Language: Tcl - Size: 113 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

boyac/pyGamgee
PyGamgee runs DeepSeek LLM with Ollama, using PyMuPDF for PDF extraction and FAISS for fast vector search. With LangChain RAG and conversation memory, it enables efficient, private document understanding—fully offline.
Language: Python - Size: 1.21 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

jfriedlein/h2a_pdf-highlightedText_to_annotation
Python tool to extract highlighted text from a pdf file and write this text into the content of each annotation
Language: Python - Size: 5.21 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

marek-jakub/siters
A simple .pdf file reader, written in Python.
Language: Python - Size: 12.7 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Lingesh81051/Similar-Template-Document-Matching-and-Fraud-Detection
An automated system for a health insurance company to streamline document processing, including template matching and fraud detection, resulting in reduction of processing time.
Language: Python - Size: 1.37 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

StarodubovAV/Python_Projects
This is repo for various python projects
Language: Jupyter Notebook - Size: 6.99 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

DioCrafts/ai-book-summarizer
📚 AI-Powered Book PDF Knowledge Extractor & Summarizer Transform your PDF books into structured knowledge effortlessly! This tool leverages AI to analyze books page by page, extracting key insights, definitions, and concepts, and organizes them into Markdown summaries for easier study
Language: Python - Size: 29.6 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

openandclose/pdfslash
Crop PDF margins from interactive interpreter
Language: Python - Size: 1.02 MB - Last synced at: 3 days ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

shefreenkaur/Web-Scraping-and-Word-Frequencies
This project analyzes word frequencies in BC Legislative documents using Stanford CoreNLP and Python. The program extracts text from PDF documents, processes it using natural language processing techniques, and generates a comprehensive word frequency analysis.
Language: Python - Size: 3.11 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Al-shwaib/Book-Preparation-for-Printing
A web application for preparing books and magazines for offset printing. Automatically arranges PDF pages for commercial A3 printing, supporting both Arabic (RTL) and English (LTR) books. تطبيق ويب لتحضير الكتب والمجلات للطباعة على مطابع الأوفست. يقوم تلقائياً بترتيب صفحات PDF للطباعة التجارية على ورق A3، مع دعم الكتب العربية والإنجليزية.
Language: Python - Size: 40 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

elias-jhsph/scienceai
An AI-powered scientific literature search engine that uses OpenAI's language models to analyze research papers. It enables users to extract data, ask complex questions, and perform ad hoc literature reviews, handling hundreds of papers simultaneously without needing metadata.
Language: Python - Size: 144 KB - Last synced at: 21 days ago - Pushed at: 11 months ago - Stars: 2 - Forks: 0

xxao/pero
Unified Python drawing API
Language: Python - Size: 5.54 MB - Last synced at: 4 days ago - Pushed at: 3 months ago - Stars: 34 - Forks: 4

BigDataIA-Spring2025-4/DAMG7245_Assignment01
A Streamlit-based app with a FastAPI backend for extracting structured data (text, images, tables) from websites and PDFs. Processed data is stored in AWS S3 and rendered in a markdown-standardized format. APIs are deployed on Google Cloud Run Service
Language: Jupyter Notebook - Size: 90.7 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

nature-of-eu-rules/data-preprocessing
Document preprocessing scripts for the Nature of EU Rules project
Language: Python - Size: 123 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

amirlogic/pymupdf-webapp
PyMuPDF webapp based on CherryPy
Language: Python - Size: 20.5 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Muneeb1030/FineTune-Tiny-Llama
Fine-tuning the Tiny Llama model to mimic my professor's writing style using the Llama Factory. The project involves data collection, preprocessing, preparation, fine-tuning, and evaluation.
Language: Jupyter Notebook - Size: 390 KB - Last synced at: 2 months ago - Pushed at: 10 months ago - Stars: 2 - Forks: 0

nsourlos/kindle_to_pdf
Transfer your Kindle highlights and notes (mobi or PDF) to PDF files
Language: Python - Size: 4.88 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Prakshal0809/RAG-Chatbot
Developed a RAG-based chatbot for seamless integration with an e-hospital platform, enhancing response accuracy by 30% through reliable, trusted medical data sources. Processed over 500+ pages of medical data, enabling real-time symptom analysis and disease suggestions.
Language: TypeScript - Size: 147 KB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

ks6088ts-labs/extractor-python 📦
A data extract tool written in Python
Language: Python - Size: 159 KB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

Guo-dalu/pdf-helper
This Python tool enables batch processing of PDFs using PyMuPDF, offering OCR text extraction and compression for handling multiple image-based PDFs efficiently.
Language: Python - Size: 26.7 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

philippe2023/RAG-Question-Answering-App
An AI-powered Question Answering application that uses Retrieval-Augmented Generation (RAG) to provide accurate and context-aware answers from uploaded PDF documents.
Language: Python - Size: 20.5 KB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Tech-C-P/ConversAI
ConversAI is an innovative conversational AI framework designed for intelligent text extraction and querying across various document formats and web content, leveraging advanced natural language processing techniques.
Language: Python - Size: 1.02 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

hase3b/SCPRAG
This repository implements a Retrieval-Augmented Generation (RAG) system for the Supreme Court of Pakistan, utilizing different LLMs, embedding models, and retrieval and generation enhancement strategies. It processes SCP judgments, applies chunking, and generates legal summaries and answers based on relevant case data.
Language: Jupyter Notebook - Size: 57.4 MB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

OrenGrinker/pdfLLM
The PDF Question Answering App uses Streamlit for a user-friendly interface where users can upload PDFs and ask questions. It employs LlamaIndex to index PDF content and PyMuPDF4LLM to parse files, enabling efficient, accurate answers based on the document’s text.
Language: Python - Size: 6.84 KB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

kezb90/PDF_To_Word
A Python-based tool that converts PDF files into editable Word documents, preserving text, images, and layout. Uses PyPDF2, PyMuPDF (fitz), python-docx, and Pillow to accurately transfer content from PDF to .docx. Ideal for transforming complex PDFs into Word format for easy editing.
Language: Python - Size: 8.79 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

venkatarangan/ProductsDigest
A Python-based web scraper that fetches details from specified product webpages, especially Amazon product pages.
Language: Python - Size: 962 KB - Last synced at: 2 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

ChristophWenk/PDFSorter
Sort and rename PDFs according to their content
Language: Python - Size: 56.6 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

olonok69/Nim_LlamaIndex
Integracion LLamaIndex with NVIDIA NIM
Language: Jupyter Notebook - Size: 2.42 MB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Ap6pack/PDF-Search-Plus
A Python application that extracts text and images from PDFs, applies OCR to images using Tesseract, and stores the results in a SQLite database. The application features a GUI for searching both text and OCR-extracted content and previewing PDF files.
Language: Python - Size: 39.1 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

vickypandey14/Convert-PDF-into-Image-By-Python
This Python script converts each page of a PDF document into separate image files. It utilizes the PyMuPDF library (fitz) to handle PDF operations and the Python Imaging Library (PIL) for image processing.
Language: Python - Size: 248 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

renan-siqueira/python-pdf-tool
This project facilitates the extraction of text from PDF files using various Python libraries. It is designed to be flexible, allowing the choice among different text extraction libraries and supporting both single PDF file and directory containing multiple PDF files.
Language: Python - Size: 7.81 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 1

Srivacthi/Acronym-List-Generator
Generates an Acronym List for your PDF quickly and locally for over 200 pages of text
Language: Python - Size: 13.7 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

benitomartin/scraping-to-sql
Open Source Contribution to Justicio Project
Language: Jupyter Notebook - Size: 6.46 MB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

timothy-bartlett/PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Language: Python - Size: 288 MB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

mriffey1/vendor-hall-exhibitors
Converts a PDF map of Gen Con's Exhibitor with their booth # to Google Sheets
Language: Python - Size: 2.76 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

zayigo/BUL-Insight
Elaborazione e archiviazione dei dati del piano Banda Ultra Larga
Language: Python - Size: 153 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

gustavo-bordin/fdp
FDP is a programming language created to make PDF text extraction easy
Language: Python - Size: 115 KB - Last synced at: 10 months ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

FlorianLD/invoice_data_extraction
POC for an automated system extracting invoice data from mail attachments using computer vision, and sending the extracted data to a Google Sheet for further analysis by business teams.
Language: Jupyter Notebook - Size: 3.19 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

petalaleite/boomer_pdf_scraping
This app scraps through especific pdf data em extract them to a new spreadsheet using Pandas.
Language: Python - Size: 476 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

paolpal/PDFWizard
Toolkit for pdf editing.
Language: Python - Size: 45.9 KB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

jonalfarlinga/pdiff
A simple utility for diffing PDFs.
Language: JavaScript - Size: 289 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

chettiargautam/PDF-Utilities
A repository that contains some personal and shared code for PDF Processing Utilities. This is only for educational purposes please do not redistribute.
Language: Python - Size: 12.7 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

Erdos1729/automated-snapshot-of-annotated-content-in-pdfs
This repository will automate the process of saving snapshots of highlighted content within multiple pdf files.
Language: Python - Size: 2.62 MB - Last synced at: 5 months ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

coycs/pdf-streamlit
PDF tools, written with Python, deployed on Streamlit
Language: Python - Size: 15.6 KB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

s4shreya/abc-ask-me-anything
It is a Full stack web application where user can upload pdf document and ask questions related to its content.
Language: JavaScript - Size: 175 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

omn1vor/omni_pdf_to_png
А simple wrapper for PyMuPDF
Language: Python - Size: 1.95 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

pymupdf/PyMuPDF-Utilities
Demos, examples and utilities using PyMuPDF
Language: Jupyter Notebook - Size: 163 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 450 - Forks: 130

pymupdf/PyMuPDF-Optional-Material
Help file downloads, early ZIP binaries, wheels for retired Python 2.7, 3.5.
Size: 2.76 GB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 14 - Forks: 3

lefkovitzj/PySimplePDF
A simple PDF Viewing application written in Python using PyMuPDF, Pillow and CustomTkinter.
Language: Python - Size: 8.79 KB - Last synced at: 10 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

afeefa-qureshi/Encrypt-Decrypt-PDF-files-using-Python
This Python project provides a simple yet powerful tool to encrypt and decrypt PDF files. It utilizes the PyPDF2 and PyMuPDF libraries to perform encryption and decryption operations, making it easy to secure sensitive PDF documents or access password-protected files.
Language: Python - Size: 4.88 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

TheWatcherMultiversal/pdfgui_tools
pdfgui_tools is a user interface tool developed in Qt and Python that integrates with poppler-utils and PyPDF2 for PDF document management. It's a simple and user-friendly tool that includes various utilities.
Language: Python - Size: 3.31 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 28 - Forks: 2

Sazizi2025/PDF-Founder
Are you short on time?! Can't you search all the PDFs one by one for the content you want?! Well, PDF-Founder is here...
Language: Python - Size: 517 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

STROAD/Merge2PDF
Merge to PDF
Language: Python - Size: 142 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

RomyJr/Retrocession_Detector
This application facilitates the comparison of two PDF files. Differences are presented in a table, color-coded as red (deletions), green (additions), and orange (moved text). Users can save the results in Excel format. It is designed to check whether annotations have been taken into account during the comparison process.
Language: Python - Size: 140 KB - Last synced at: 7 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

RomyJr/PDF_TXT_Word_research
This application simplifies PDF keyword searches, allowing users to easily find specific terms in files or folders. Results are displayed clearly, and the history feature enables quick review and filtering of past searches. Users can click on document links in the history to open them directly in the default PDF viewer.
Language: Python - Size: 105 KB - Last synced at: 7 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

bilalhameed248/PDF-Document-Extraction
Python PDF-to-HTML Converter: Transforming PDF Documents into Structured HTML Tags. - Feb 2022 - Jun 2023
Language: Python - Size: 73.2 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

proviveknayan/document-keyword-search
Search PDF for specific keywords using Python 3. A simple Python program that searches all PDF documents in a folder for a set of keywords and lists all documents along with the keywords present in them.
Language: Jupyter Notebook - Size: 2.35 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

wh01Samyak/PDF_proofreader
Open source PDF proofreader
Language: Python - Size: 71.3 KB - Last synced at: almost 2 years ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0

antoniotejada/srdine
Generates enhanced Dungeons and Dragons 5e SRD pdf
Language: Python - Size: 48.8 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

gautam132002/invoice-pdf-data-extraction
Automated extraction of specific information from invoices, achieving over 95% accuracy.
Language: Python - Size: 4.03 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

TeoJJss/image-playground
Language: HTML - Size: 6.84 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

aphp/edspdf-mupdf
MuPDF extension for EDS-PDF
Language: Python - Size: 1.56 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

devxzh/PDFTools
基于pyqt5, pymupdf实现的批量添加目录书签,增强pdf,拆分合并pdf的小工具
Language: Python - Size: 40.1 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 39 - Forks: 6

Pranay-03/pdf-compression
This project involves accessing all the files from a google drive folder and compressing them with out the loss in quality
Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

stroblme/UNote
Fills the lack of an open-source PDF Editor with the capability to draw and add notes
Language: Python - Size: 79.6 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 10 - Forks: 1

MaarkNassef/PDFGame
This project is a web application built using the Flask framework that allows users to upload a PDF file containing text and converts it into a new PDF file where each page of the original PDF is represented as an image. The application will use the PyMuPDF library to read and convert the text pages into images and also to write the new PDF file.
Language: Python - Size: 5.86 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

shayanalibhatti/Designing-a-PDF-Audiobook-using-Python
In this code, a simple implementation of PDF to audio converter is shown
Language: Python - Size: 1.36 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 26 - Forks: 19

Pancham1603/discord-pdf
View pdf files in discord text channels without downloading
Language: Python - Size: 1.95 KB - Last synced at: 2 months ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 0

myogpatterns/layered-pdf-merge
Merges multiple PDFs into a combined PDF file respecting layers aka Optional Content Group
Language: Python - Size: 826 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 1

TeamAPS404/PDF_Parser
This is a PDF parser which parses pdf into JSON format. The main objective of this project is to parse fir pdf copies into json files so that officers can easily extract important data.
Language: Python - Size: 89.2 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

BlackCatDevel0per/PDF4Cat
PDFCat Simple and Power tool for processing pdf docs using PyMuPDF
Language: Python - Size: 108 KB - Last synced at: about 2 months ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

hreikin/pdf-toolbox
Extract content from PDF's and convert or create new documents from the content in multiple output formats.
Language: Python - Size: 7.57 MB - Last synced at: 2 months ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

alphayama/docname_verifier
This is a small project that I worked upon while doing my internship.
Language: Python - Size: 23.4 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0
