An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: pdf-scraping

scottgriv/python-pdf_web_scraper

Scrape a web page for pdf files and download them all locally.

Language: Python - Size: 375 KB - Last synced at: 6 days ago - Pushed at: about 2 months ago - Stars: 11 - Forks: 2

edoardottt/multi-pdf-finder

Are you looking for a word in many pdf files? Do it one time. ⚡

Language: Shell - Size: 26.4 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 16 - Forks: 3

casychow/pdf_scraper_extract_largest_num

Python module to scrape information from a PDF file with different data types (eg. tables, graphs) and extract the largest number it can find.

Language: Jupyter Notebook - Size: 11.3 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

gwu-libraries/uriscrape

Scrape URIs from Telegram channel transcripts in PDF files

Language: Python - Size: 69.3 KB - Last synced at: 2 months ago - Pushed at: about 6 years ago - Stars: 3 - Forks: 0

Spyrosigma/ResuMeme

Upload your Resume and see yourself getting roasted.

Language: Python - Size: 57.6 KB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

tam0w/poverty_data

Attempting to analyse and estimate poverty indicators at the Indian district level. First ever district level dataset with a poverty indicator.

Language: Jupyter Notebook - Size: 184 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

prak112/esg-profile

Assessing stock-price fluctuations of companies based on their ESG-profiles

Language: Jupyter Notebook - Size: 2.11 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 1

ibotsuft/scripts

Scripts written by iBots team.

Language: Python - Size: 13.7 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

mattkerlogue/google-covid-mobility-scrape 📦

Script for scraping Google's COVID19 Community Mobility Reports [ARCHIVED]

Language: R - Size: 17.5 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 33 - Forks: 14

kaigg96/Driving-Towards-Efficiency

Using Python and the Natural Resources Canada Fuel Consumption Ratings to view and predict vehicle efficiency.

Language: Jupyter Notebook - Size: 24.1 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 1

ethanpbrooks/Schwab-PDF-Scraper

PDF Statement Data Extractor and Analyzer. A Python script for extracting and analyzing financial data from PDF statements, with a focus on Schwab statements.

Language: Python - Size: 452 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 1

SteadyGiant/scrape-naic 📦

Scraping tables from the PDFs of NAIC Model Laws, Regulations, and Guidelines.

Language: R - Size: 1.68 MB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 0

hellpanderrr/pypdfscraper

Lightweight PDF scraper

Language: Python - Size: 782 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

iamcjt922/Funding-Analysis

A custom created application with a GUI utilizing Python and libraries PyPDF2 to scrape, scan and evaluate a person's funding capacity based on their PDF credit report.

Language: Python - Size: 160 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

TomasHubelbauer/pdf-scrape

Demonstrating PDF text and image extraction with correct bounds

Language: JavaScript - Size: 1.54 MB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

fayrose/MiddleEgyptianDataset

Parses 3 dictionaries from PDFs, reconstructs lost formatting using N-gram and visual computing methods, and serializes to a database for web display.

Language: C# - Size: 70.9 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 1

TomasHubelbauer/globus

Scrapes the Globus PDF catalogue using Puppeteer

Language: JavaScript - Size: 25.3 MB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

gra-vel/covid-pichincha

Visualization of reported cases of COVID-19 in Pichincha, Ecuador

Language: Python - Size: 13.8 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 1

chris-bbrs/pdf-merging-and-scraping

PDF merging and scraping for nlp use

Language: Jupyter Notebook - Size: 8.79 KB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

coelicidium/marpl-project

A free as in freedom modular, flexible, customizable all-in-one suite for all your open science needs.

Size: 15.6 KB - Last synced at: 10 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

zach-hunt/PDFParsing

Data extraction from PDF tables

Language: Python - Size: 1.95 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

GGSIPUResultTracker/ggsipu_results_extractor

Python module to extract and dump results data from GGSIPU results pdf

Language: Python - Size: 5.7 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1