pdf-extraction | Topic | Ecosyste.ms: Repos

Topic: "pdf-extraction"

ArtifexSoftware/mupdf.js

JavaScript bindings for MuPDF

Language: TypeScript - Size: 2.41 MB - Last synced at: about 8 hours ago - Pushed at: 17 days ago - Stars: 537 - Forks: 34

pytr-org/pytr

Use TradeRepublic in terminal and mass download all documents

Language: Python - Size: 262 KB - Last synced at: 2 days ago - Pushed at: 17 days ago - Stars: 526 - Forks: 105

24eme/signaturepdf

Free open-source web software for signing PDF (alone or with others) and also organize pages, edit medata and compress pdf

Language: JavaScript - Size: 7.6 MB - Last synced at: 6 days ago - Pushed at: 9 days ago - Stars: 511 - Forks: 62

Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced information retrieval and processing.

Language: Python - Size: 69.3 KB - Last synced at: about 16 hours ago - Pushed at: 6 months ago - Stars: 76 - Forks: 7

mateogon/pdf-narrator

Convert your PDFs and EPUBs into audiobooks effortlessly. Features intelligent text extraction, customizable text-to-speech settings, and efficient processing for low-resource systems.

Language: Python - Size: 4.38 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 56 - Forks: 10

adobe/pdftools-extract-java-sdk-samples

This sample project provides a preview of the PDF Extract API. Using the sample project and this documentation, you will easily be able to integrate the PDF Extract API in your own server-side code.

Language: Java - Size: 604 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 6

pcschreiber1/PDF_Extraction-Translation

Translate many large PDF Reports for free using Python.

Language: Jupyter Notebook - Size: 5.61 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 3

heshiming/paddlefish Fork of os-climate/crrf-det

A Python + C implementation for image-based PDF page layout analysis and content extraction.

Language: C++ - Size: 5.26 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

Aumlo123/pdfdoom

DOOM in a PDF (as ascii art)

Size: 1000 Bytes - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

LorysHamadache/pdf2txt-multipage-extractor

Fast batch tool to extract first-page text from all PDFs in a folder using Python. Optimized with multiprocessing to handle thousands of PDFs efficiently.

Language: Python - Size: 609 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

souvik03-136/TenderBot

Task

Language: Python - Size: 127 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

anyparser/anyparserjs

Anyparser Typescript SDK for RAG/ETL Pipelines - File Content Extraction. Supports extraction from various file formats including PDF, Microsoft Office documents, OCR/Image to Text, Audio to Text, and Website to Text.

Language: TypeScript - Size: 408 KB - Last synced at: 15 days ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

Amartya-007/Pdf-Reader

Making an app so that we can read and extract information from prf easily or chat with our pdfs.

Language: Python - Size: 7.81 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

tracywong117/extract-info-from-pdf-paper

This Python script uses pdfminer.six, PyPDF2, pdf2image to extract information (text, image) from pdf paper.

Language: Python - Size: 3.37 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 1

heijul/pdf2gtfs

A python tool to extract schedule data from PDF timetables and output it in GTFS.

Language: Python - Size: 14.2 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

javaidb/personal-finance-tracker

Personal finance tracker via interpretation of bank statements from Scotiabank. Insights into spending habits, trends and long-term growth.

Language: Jupyter Notebook - Size: 420 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

JoseLVillaronga/teccam_pdf

Teccam PDF es una aplicación web en Python/Flask que extrae texto de documentos PDF y páginas web, lo convierte automáticamente a Markdown y lo almacena en MongoDB. Ofrece interfaz responsive con modo claro/oscuro, gestión de permisos (público/privado), marcadores de posición de lectura y despliegue como servicio systemd.

Language: HTML - Size: 41 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

BenjaminDanker/Data-AI-Prepare

A collection of Python utilities for preparing and transforming text data—PDF extraction, paragraph analysis, embedding generation, URL scraping, CSV conversion, and Astra DB uploads

Language: Python - Size: 473 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

RaghuSharma14/PDF-Reader

A PDF Reader application powered by AI, allowing users to upload PDF documents and extract meaningful information using advanced NLP models. Built with Streamlit, Transformers, and Langchain, this app provides a seamless interface for interacting with and analyzing PDF content.

Language: Python - Size: 0 Bytes - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 0 - Forks: 0

Atul-vaibhav/OCR-Extraction-Using-Python

Extract text from images and PDFs using python and store in a JSON Format. Store the extracted in MYSQL database.

Language: Python - Size: 740 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

ozcanmiraay/opsbot

AI-powered PDF extraction suite for structured insights from contracts, forms, and documents. Built with Streamlit, LangChain, GPT-4o, and PDFPlumber.

Language: Python - Size: 9.61 MB - Last synced at: 11 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

iodize6399/wwmai-copper-data

Historical copper price data from WWMAI circulars. Raw PDFs and structured CSV data tracking electrolytic copper wire rod prices and calculation components.

Size: 15.6 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

AnhDungPham2901/extract_data_from_pdf

Using LLM to extract unstructured data from pdf file into structured format

Language: Jupyter Notebook - Size: 217 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

ascender1729/vodafone-financial-analysis

Automated financial table extraction and standardization from Vodafone's annual report using GPT-4o-mini

Language: Rich Text Format - Size: 797 KB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

anquetos/gcp-professional-data-engineer-rag

Build a local RAG (Retrieval Augmented Generation) to generate exam questions for the Google Cloud Platform professional Data Engineer certification.

Language: Jupyter Notebook - Size: 289 KB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

SSAYKO/schedule_app

Efficient algorithm for generating optimized academic schedules based on subject priorities and group availability.

Language: Python - Size: 59.6 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

siddharth-nandagopal/billionaires-rag-query

Billionaires RAG Query uses LLMs and a RAG framework to analyze the world's billionaires list. Extracts tabular data from PDFs, converts to multiple formats, and enables precise queries about net worth, age, and more. Integrates with Poetry and asdf for easy setup and management.

Language: Python - Size: 707 KB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

lectrician1/extract-text-app

Web app to allow users to batch extract text from images and PDFs

Language: Svelte - Size: 536 KB - Last synced at: about 2 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

rishisolanke/PDF_Query_Langchain

PDF Query LangChain is a tool that extracts and queries information from PDF documents using advanced language processing. Leveraging LangChain, OpenAI, and Cassandra, this app enables efficient, interactive querying of PDF content. Ideal for data analysis, research, and automated reporting, it simplifies detailed document analysis with ease.

Language: Python - Size: 4.88 KB - Last synced at: about 2 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

FTiniNadhirah/Text-Preprocessing

Language: Python - Size: 1.08 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos