Topic: "pdf-extraction"
ArtifexSoftware/mupdf.js
JavaScript bindings for MuPDF
Language: TypeScript - Size: 2.41 MB - Last synced at: about 8 hours ago - Pushed at: 17 days ago - Stars: 537 - Forks: 34

pytr-org/pytr
Use TradeRepublic in terminal and mass download all documents
Language: Python - Size: 262 KB - Last synced at: 2 days ago - Pushed at: 17 days ago - Stars: 526 - Forks: 105

24eme/signaturepdf
Free open-source web software for signing PDF (alone or with others) and also organize pages, edit medata and compress pdf
Language: JavaScript - Size: 7.6 MB - Last synced at: 6 days ago - Pushed at: 9 days ago - Stars: 511 - Forks: 62

iamarunbrahma/pdf-to-markdown
Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced information retrieval and processing.
Language: Python - Size: 69.3 KB - Last synced at: about 16 hours ago - Pushed at: 6 months ago - Stars: 76 - Forks: 7

mateogon/pdf-narrator
Convert your PDFs and EPUBs into audiobooks effortlessly. Features intelligent text extraction, customizable text-to-speech settings, and efficient processing for low-resource systems.
Language: Python - Size: 4.38 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 56 - Forks: 10

adobe/pdftools-extract-java-sdk-samples
This sample project provides a preview of the PDF Extract API. Using the sample project and this documentation, you will easily be able to integrate the PDF Extract API in your own server-side code.
Language: Java - Size: 604 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 6

pcschreiber1/PDF_Extraction-Translation
Translate many large PDF Reports for free using Python.
Language: Jupyter Notebook - Size: 5.61 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 3

heshiming/paddlefish Fork of os-climate/crrf-det
A Python + C implementation for image-based PDF page layout analysis and content extraction.
Language: C++ - Size: 5.26 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

Aumlo123/pdfdoom
DOOM in a PDF (as ascii art)
Size: 1000 Bytes - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

LorysHamadache/pdf2txt-multipage-extractor
Fast batch tool to extract first-page text from all PDFs in a folder using Python. Optimized with multiprocessing to handle thousands of PDFs efficiently.
Language: Python - Size: 609 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

souvik03-136/TenderBot
Task
Language: Python - Size: 127 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

anyparser/anyparserjs
Anyparser Typescript SDK for RAG/ETL Pipelines - File Content Extraction. Supports extraction from various file formats including PDF, Microsoft Office documents, OCR/Image to Text, Audio to Text, and Website to Text.
Language: TypeScript - Size: 408 KB - Last synced at: 15 days ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

Amartya-007/Pdf-Reader
Making an app so that we can read and extract information from prf easily or chat with our pdfs.
Language: Python - Size: 7.81 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

tracywong117/extract-info-from-pdf-paper
This Python script uses pdfminer.six, PyPDF2, pdf2image to extract information (text, image) from pdf paper.
Language: Python - Size: 3.37 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 1

heijul/pdf2gtfs
A python tool to extract schedule data from PDF timetables and output it in GTFS.
Language: Python - Size: 14.2 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

javaidb/personal-finance-tracker
Personal finance tracker via interpretation of bank statements from Scotiabank. Insights into spending habits, trends and long-term growth.
Language: Jupyter Notebook - Size: 420 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

JoseLVillaronga/teccam_pdf
Teccam PDF es una aplicación web en Python/Flask que extrae texto de documentos PDF y páginas web, lo convierte automáticamente a Markdown y lo almacena en MongoDB. Ofrece interfaz responsive con modo claro/oscuro, gestión de permisos (público/privado), marcadores de posición de lectura y despliegue como servicio systemd.
Language: HTML - Size: 41 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

BenjaminDanker/Data-AI-Prepare
A collection of Python utilities for preparing and transforming text data—PDF extraction, paragraph analysis, embedding generation, URL scraping, CSV conversion, and Astra DB uploads
Language: Python - Size: 473 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

RaghuSharma14/PDF-Reader
A PDF Reader application powered by AI, allowing users to upload PDF documents and extract meaningful information using advanced NLP models. Built with Streamlit, Transformers, and Langchain, this app provides a seamless interface for interacting with and analyzing PDF content.
Language: Python - Size: 0 Bytes - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 0 - Forks: 0

Atul-vaibhav/OCR-Extraction-Using-Python
Extract text from images and PDFs using python and store in a JSON Format. Store the extracted in MYSQL database.
Language: Python - Size: 740 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

ozcanmiraay/opsbot
AI-powered PDF extraction suite for structured insights from contracts, forms, and documents. Built with Streamlit, LangChain, GPT-4o, and PDFPlumber.
Language: Python - Size: 9.61 MB - Last synced at: 11 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

iodize6399/wwmai-copper-data
Historical copper price data from WWMAI circulars. Raw PDFs and structured CSV data tracking electrolytic copper wire rod prices and calculation components.
Size: 15.6 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

AnhDungPham2901/extract_data_from_pdf
Using LLM to extract unstructured data from pdf file into structured format
Language: Jupyter Notebook - Size: 217 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

ascender1729/vodafone-financial-analysis
Automated financial table extraction and standardization from Vodafone's annual report using GPT-4o-mini
Language: Rich Text Format - Size: 797 KB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

anquetos/gcp-professional-data-engineer-rag
Build a local RAG (Retrieval Augmented Generation) to generate exam questions for the Google Cloud Platform professional Data Engineer certification.
Language: Jupyter Notebook - Size: 289 KB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

SSAYKO/schedule_app
Efficient algorithm for generating optimized academic schedules based on subject priorities and group availability.
Language: Python - Size: 59.6 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

siddharth-nandagopal/billionaires-rag-query
Billionaires RAG Query uses LLMs and a RAG framework to analyze the world's billionaires list. Extracts tabular data from PDFs, converts to multiple formats, and enables precise queries about net worth, age, and more. Integrates with Poetry and asdf for easy setup and management.
Language: Python - Size: 707 KB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

lectrician1/extract-text-app
Web app to allow users to batch extract text from images and PDFs
Language: Svelte - Size: 536 KB - Last synced at: about 2 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

rishisolanke/PDF_Query_Langchain
PDF Query LangChain is a tool that extracts and queries information from PDF documents using advanced language processing. Leveraging LangChain, OpenAI, and Cassandra, this app enables efficient, interactive querying of PDF content. Ideal for data analysis, research, and automated reporting, it simplifies detailed document analysis with ease.
Language: Python - Size: 4.88 KB - Last synced at: about 2 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

FTiniNadhirah/Text-Preprocessing
Language: Python - Size: 1.08 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0
