Topic: "pdf-text-extraction"
houking-can/PDFSDK
Based on Foxit Quick PDF Library,python interface
Language: Python - Size: 8.27 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 8 - Forks: 2

PrathameshDhande22/PdfTxtBot
A Telegram bot which extract Text from PDF, also extract the Images of PDF Pages. Made with Python
Language: Python - Size: 12.7 KB - Last synced at: 20 days ago - Pushed at: about 2 years ago - Stars: 4 - Forks: 2

vijayengineer/PDFTextSpeechConverter
Converts scanned documents and ordinary documents into speech mp3 using Amazon Polly
Language: Python - Size: 1.18 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 4 - Forks: 1

eli64s/pdflex
CLI for merging PDF contexts.
Language: Python - Size: 465 KB - Last synced at: 17 days ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

Zeeshanahmad4/NLP-Pdf-Minning-Extracting-text-from-pdf
NLP Pdf Minning Extracting text from pdf
Language: Python - Size: 2.86 MB - Last synced at: 24 days ago - Pushed at: about 5 years ago - Stars: 3 - Forks: 1

rithulkamesh/docproc
Opinionated and Sophisticated Document Region Analyzer.
Language: Python - Size: 219 KB - Last synced at: 4 days ago - Pushed at: 12 days ago - Stars: 2 - Forks: 0

VirajMadhu/pdf_key_matcher
Highlights the key matches between your Given PDF and the description text
Language: Python - Size: 19.5 KB - Last synced at: 13 days ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

rmottanet/unchainedtext
UnchainedText: Break free from PDFs! Easily extract raw text to .txt for preprocessing.
Language: Python - Size: 31.3 KB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

mamiriqbal1/rag_book_qa_prompt
A simple demonstration of how you can implement retrieval augmented generation (RAG) for a book.
Language: Jupyter Notebook - Size: 12.3 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

kushalpatel0265/Resume-Parser
A resume parser that extracts key details from PDF files using Groq's LLM
Language: Jupyter Notebook - Size: 239 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

simonpierreboucher/Crawler
A robust, modular web crawler built in Python for extracting and saving content from websites. This crawler is specifically designed to extract text content from both HTML and PDF files, saving them in a structured format with metadata.
Language: Python - Size: 87.9 KB - Last synced at: 26 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

towfique-elahe/pdf-to-structured-csv
A Python-based tool for extracting structured data from PDFs using OCR and regex, and exporting it to CSV. Ideal for processing invoices, logs, or scanned documents into organized, usable datasets.
Language: Jupyter Notebook - Size: 27.3 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

RealBlueSwan/BSPDFDataExtractor
Extracts Data from provided PDF using key words to identify relevant datapoints. Using UglyToad PDFPIG(great lib btw)
Language: C# - Size: 7.8 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Spikes2012/DjangoBusPriority
This is for Technology Application Project at Swinburne University of Technology
Language: Python - Size: 249 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0
