An open API service providing repository metadata for many open source software ecosystems.

Topic: "data-extraction"

getmaxun/maxun

πŸ”₯ Open Source No Code Web Data Extraction Platform β€’ Turn Websites To APIs & Spreadsheets With No-Code Robots In Minutes πŸ”₯

Language: TypeScript - Size: 4.26 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 12,533 - Forks: 973

vi3k6i5/flashtext

Extract Keywords from sentence or Replace keywords in sentences.

Language: Python - Size: 439 KB - Last synced at: 5 days ago - Pushed at: 28 days ago - Stars: 5,648 - Forks: 603

D4Vinci/Scrapling

πŸ•·οΈ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

Language: Python - Size: 1.82 MB - Last synced at: 7 days ago - Pushed at: 12 days ago - Stars: 2,969 - Forks: 189

JonathanLink/PDFLayoutTextStripper

Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).

Language: Java - Size: 21.1 MB - Last synced at: 20 days ago - Pushed at: over 1 year ago - Stars: 1,589 - Forks: 214

hi-primus/optimus

:truck: Agile Data Preparation Workflows madeΒ easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

Language: Python - Size: 110 MB - Last synced at: 1 day ago - Pushed at: 5 months ago - Stars: 1,508 - Forks: 232

raznem/parsera

Lightweight library for scraping web-sites with LLMs

Language: Python - Size: 2.21 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 1,069 - Forks: 64

thinh-vu/vnstock

A beginner-friendly yet powerful Python toolkit for financial analysis and automation β€” built to make modern investing accessible to everyone

Language: Python - Size: 56.6 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 798 - Forks: 175

midavr09/BCParser

Size: 15.6 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 771 - Forks: 0

polyrabbit/hacker-news-digest

:newspaper: Let ChatGPT Summarize Hacker News for You

Language: Python - Size: 4.65 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 712 - Forks: 93

adrienjoly/npm-pdfreader

🚜 Parse text and tables from PDF files.

Language: HTML - Size: 1.77 MB - Last synced at: 6 days ago - Pushed at: 4 months ago - Stars: 674 - Forks: 85

chakshu-jain/BCParser

Size: 0 Bytes - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 603 - Forks: 0

a-maliarov/amazoncaptcha

Pure Python, lightweight, Pillow-based solver for Amazon's text captcha.

Language: Python - Size: 81 MB - Last synced at: 5 days ago - Pushed at: 7 days ago - Stars: 471 - Forks: 85

parv-mehta10/BCParser

BCParser Bitcoin-Tool Blockchain-Parser Crypto-Tool BTC-Data-Analysis Blockchain-Analysis Cryptocurrency-Parser Data-Extraction Blockchain-Tool BTC-Analysis Crypto-Parser

Size: 15.6 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 410 - Forks: 0

Almakster/BCParser

BCParser Bitcoin-Tool Blockchain-Parser Crypto-Tool BTC-Data-Analysis Blockchain-Analysis Cryptocurrency-Parser Data-Extraction Blockchain-Tool

Size: 15.6 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 353 - Forks: 0

shcherbak-ai/contextgem

ContextGem: Effortless LLM extraction from documents

Language: Python - Size: 9.73 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 313 - Forks: 26

py-pdf/benchmarks

Benchmarking PDF libraries

Language: Python - Size: 3.73 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 269 - Forks: 15

notluken/BCParser

Size: 15.6 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 220 - Forks: 0

jpjacobpadilla/Stealth-Requests

Undetected Web-Scraping & Seamless HTML Parsing in Python!

Language: Python - Size: 691 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 203 - Forks: 10

serpapi/clauneck

A tool for scraping emails, social media accounts, and much more information from websites using Google Search Results.

Language: Ruby - Size: 34.2 KB - Last synced at: about 4 hours ago - Pushed at: about 1 year ago - Stars: 178 - Forks: 11

molybdenum-99/infoboxer

Wikipedia information extraction library

Language: Ruby - Size: 8.17 MB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 175 - Forks: 13

sypht-team/sypht-python-client

A python client for the Sypht API

Language: Python - Size: 165 KB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 162 - Forks: 5

dilawar/PlotDigitizer

A Python utility to digitize plots.

Language: Python - Size: 2.15 MB - Last synced at: 1 day ago - Pushed at: 9 months ago - Stars: 139 - Forks: 24

173TECH/sayn

Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).

Language: Python - Size: 4.54 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 122 - Forks: 15

CambioML/any-parser

Accurate, private and configurable document retrieval LLM

Language: Python - Size: 22.1 MB - Last synced at: 24 days ago - Pushed at: 25 days ago - Stars: 121 - Forks: 11

nfx/go-htmltable

Structured HTML table data extraction from URLs in Go that has almost no external dependencies

Language: Go - Size: 416 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 120 - Forks: 8

johnbumgarner/newspaper3_usage_overview

This repository provides usage examples for the Python module Newspaper3k.

Language: Python - Size: 121 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 120 - Forks: 17

dream-num/univer-clipsheet

A powerful Chrome extension for web scraping

Language: TypeScript - Size: 5.72 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 119 - Forks: 17

villagecomputing/superpipe

Superpipe - optimized LLM pipelines for structured data

Language: Python - Size: 11.2 MB - Last synced at: 9 days ago - Pushed at: 11 months ago - Stars: 110 - Forks: 3

sshniro/line-segmentation-algorithm-to-gcp-vision

Line segmentation algorithm for Google Vision API.

Language: Kotlin - Size: 2.76 MB - Last synced at: 11 days ago - Pushed at: over 2 years ago - Stars: 97 - Forks: 37

reincubate/ricloud

Python client for Reincubate's ricloud API. Yes, it works with iOS 14 & iPhone 12 backups!

Language: Python - Size: 220 KB - Last synced at: 22 days ago - Pushed at: about 5 years ago - Stars: 95 - Forks: 25

chenkovsky/cyac

High performance Trie and Ahocorasick automata (AC automata) Keyword Match & Replace Tool for python. Correct case insensitive implementation!

Language: Cython - Size: 1.75 MB - Last synced at: 21 days ago - Pushed at: 7 months ago - Stars: 94 - Forks: 15

hermit-crab/ScrapeMate

Scraping assistant tool. Editing and maintaining CSS/XPath selectors across webpages.

Language: JavaScript - Size: 761 KB - Last synced at: almost 2 years ago - Pushed at: almost 7 years ago - Stars: 93 - Forks: 12

tech-engine/goscrapy

GoScrapy: Harnessing Go's power for blazingly fast web scraping, inspired by Python's Scrapy framework.

Language: Go - Size: 6.16 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 89 - Forks: 2

dav009/flash

Golang Keyword extraction/replacement Datastructure using Tries instead of regexes

Language: Go - Size: 7.81 KB - Last synced at: 11 days ago - Pushed at: over 7 years ago - Stars: 89 - Forks: 6

sypht-team/sypht-java-client

A Java client for the Sypht API

Language: Java - Size: 108 KB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 87 - Forks: 1

docwire/docwire

DocWire SDK: Award-winning modern data processing in C++20. SourceForge Community Choice & Microsoft support. AI-driven processing. Supports nearly 100 data formats, including email boxes and OCR. Boost efficiency in text extraction, web data extraction, data mining, document analysis. Offline processing is possible for security and confidentiality

Language: C++ - Size: 35.8 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 83 - Forks: 18

danburzo/hred

Reduce HTML and XML to JSON from the command line, using an expressive query language inspired by CSS selectors.

Language: JavaScript - Size: 207 KB - Last synced at: 8 days ago - Pushed at: 8 months ago - Stars: 73 - Forks: 1

Zubdata/Google-Maps-Scraper

Google maps scraper with gui

Language: Python - Size: 146 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 68 - Forks: 27

WeTransfer/format_parser

file metadata parsing, done cheap

Language: Ruby - Size: 891 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 62 - Forks: 18

chrisrober011/BCParser

BCParser Bitcoin-Tool Blockchain-Parser Crypto-Tool BTC-Data-Analysis Blockchain-Analysis Cryptocurrency-Parser Data-Extraction Blockchain-Tool

Size: 15.6 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 55 - Forks: 0

html-extract/hext

Domain-specific language for extracting structured data from HTML documents

Language: C++ - Size: 2.13 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 53 - Forks: 3

scopashq/typestream πŸ“¦

⚑️ Next-generation data transformation framework for TypeScript that puts developer experience first

Language: TypeScript - Size: 560 KB - Last synced at: 3 days ago - Pushed at: about 3 years ago - Stars: 53 - Forks: 0

uhh-lt/newsleak

Information extraction and interactive visualization of textual datasets for investigative data-driven journalism and eDiscovery

Language: Java - Size: 116 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 52 - Forks: 15

rohanpillai20/Table-Extractor-From-Image

This repository contains the code that extracts a table from an image and exports it to an Excel.

Language: Python - Size: 72.3 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 51 - Forks: 14

StabRise/spark-pdf

PDF DataSource for Apache Spark, allow to read PDF files directly to the DataFrame and ocr it

Language: Scala - Size: 5.72 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 49 - Forks: 3

Articdive/ArticData

Collection of data extracted from Minecraft.

Size: 7.33 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 44 - Forks: 0

VorTECHsa/refinery

Refinery is a tool to extract and transform semi-structured data from Excel spreadsheets of different layouts in a declarative way.

Language: Kotlin - Size: 374 KB - Last synced at: 12 months ago - Pushed at: almost 2 years ago - Stars: 44 - Forks: 6

serpapi/google-search-results-java

Google Search Results JAVA API via SerpApi

Language: Java - Size: 260 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 43 - Forks: 24

linw1995/jsonpath

A query expression for extracting data from JSON.

Language: Python - Size: 763 KB - Last synced at: 7 days ago - Pushed at: 5 months ago - Stars: 41 - Forks: 4

luminati-io/brightdata-mcp

A powerful Model Context Protocol (MCP) server that provides an all-in-one solution for public web access.

Language: JavaScript - Size: 63.8 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 40 - Forks: 4

shriprem/FWDataViz

Fixed Width Data Visualizer plugin for Notepad++. Turns Notepad++ into Excel for fixed-width data files. Displays cursor position data. Jumps to specific fields. Folding Record Blocks. Extracts Data. Builtin dialogs to configure file-type, record-type & fields; Themes & Colors; and Folding. Handles homogenous, mixed & multi-line records.

Language: C++ - Size: 12.3 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 40 - Forks: 5

VictorAtPL/awesome-receipt-data-extraction πŸ“¦

A curated list (and summaries) of awesome research publications on topic of data extraction from photos of receipts.

Language: TeX - Size: 13.6 MB - Last synced at: 5 days ago - Pushed at: over 2 years ago - Stars: 37 - Forks: 5

rekloud/tinvois-parser

Extract receipt info

Language: Python - Size: 22.1 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 35 - Forks: 3

mhucka/taupe πŸ“¦

Taupe takes a downloaded Twitter archive ZIP file, extracts the URLs corresponding to tweets, retweets, replies, quote tweets, and liked tweets, and outputs the results in a comma-separated values (CSV) format that you can use with other software tools.

Language: Python - Size: 176 KB - Last synced at: 2 days ago - Pushed at: about 2 years ago - Stars: 33 - Forks: 1

johnbumgarner/newshound

This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around the world in over 50 languages.

Size: 28.3 KB - Last synced at: 6 days ago - Pushed at: about 2 years ago - Stars: 33 - Forks: 3

sypht-team/sypht-golang-client

A Golang client for the Sypht API

Language: Go - Size: 73.2 KB - Last synced at: 28 days ago - Pushed at: almost 5 years ago - Stars: 33 - Forks: 0

MrHacker-X/OsintifyX

OsintifyX: Powerful Open-source OSINT tool for extracting valuable information from Instagram profiles.

Language: Python - Size: 3.83 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 32 - Forks: 2

rubydamodar/ProText-Analyzer

ProText Analyzer is a powerful tool for extracting insights from text. It conducts sentiment analysis, categorizing content as positive, negative, or neutral, while also assessing readability and linguistic complexity. Ideal for businesses and researchers, it enhances understanding of textual data.

Language: Jupyter Notebook - Size: 1.32 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 30 - Forks: 1

linw1995/data_extractor

Combine XPath, CSS Selectors and JSONPath for Web data extracting.

Language: Python - Size: 1.07 MB - Last synced at: 7 days ago - Pushed at: 5 months ago - Stars: 28 - Forks: 5

AryanVBW/Exif

ExifTool is a powerful command-line tool that can be used to extract and edit metadata in a wide range of media files, including images, audio, and video. Metadata is information that is stored within a file that describes the file’s content or other attributes.

Language: Python - Size: 8.12 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 26 - Forks: 10

gambolputty/wiktionary-de-parser

Extract data from German Wiktionary XML files.

Language: Python - Size: 488 KB - Last synced at: 8 days ago - Pushed at: 4 months ago - Stars: 26 - Forks: 8

NextKore/SmartMuv

An EVM-compatible Solidity Smart Contract Storage/Slot Analyzer and Data Extractor.

Language: Python - Size: 226 KB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 25 - Forks: 7

chaitanyarahalkar/Financial-Info-Extractor

Extract financial information in CSV format for companies compliant to the NSE

Language: Python - Size: 36.1 KB - Last synced at: about 1 month ago - Pushed at: almost 7 years ago - Stars: 22 - Forks: 7

arkutils/Obelisk

Project Obelisk - Uploading Ark Data daily

Size: 59.5 MB - Last synced at: about 2 hours ago - Pushed at: about 3 hours ago - Stars: 21 - Forks: 5

ImranR98/Wealthsimpleton

A Python script that scrapes your Wealthsimple activity history and saves the data in a JSON file.

Language: Python - Size: 9.77 KB - Last synced at: 7 days ago - Pushed at: about 2 months ago - Stars: 21 - Forks: 4

cpl/exodus

Data exfiltration using DNS

Language: Go - Size: 38.1 KB - Last synced at: 11 months ago - Pushed at: over 5 years ago - Stars: 21 - Forks: 3

pim97/scrappey-wrapper-python

An API wrapper for Scrappey.com written in Python (cloudflare, datadome bypass & solver)

Language: Python - Size: 248 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 20 - Forks: 0

biraj21/web-wanderer

A multi-threaded web crawler written in Python, utilizing ThreadPoolExecutor and Playwright to efficiently crawl dynamically rendered web pages and download them.

Language: Python - Size: 207 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 20 - Forks: 1

shdev/phpflashtext

Extract Keywords from sentence or Replace keywords in sentences. @ https://github.com/vi3k6i5/flashtext

Language: PHP - Size: 1.21 MB - Last synced at: 6 days ago - Pushed at: almost 6 years ago - Stars: 20 - Forks: 5

OwenOrcan/YiraBot-Crawler

YiraBot: Simplifying Web Scraping for All. A user-friendly tool for developers and enthusiasts, offering command-line ease and Python integration. Ideal for research, SEO, and data collection.

Language: Python - Size: 221 KB - Last synced at: 13 days ago - Pushed at: 6 months ago - Stars: 19 - Forks: 0

arkutils/Purlovia

Project Purlovia - digging up Ark data

Language: Python - Size: 3.03 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 19 - Forks: 9

peterstangl/svg2data

A Python module for reading data from a plot provided as SVG file.

Language: Python - Size: 63.5 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 19 - Forks: 3

dossma/telegram-downloader

Download all content from a Telegram channel

Language: Python - Size: 56.6 KB - Last synced at: 23 days ago - Pushed at: 24 days ago - Stars: 18 - Forks: 4

Fabiopf02/ofx-data-extractor

A module written in TypeScript that provides a utility to extract data from an OFX file in Node.js and Browser

Language: TypeScript - Size: 210 KB - Last synced at: 9 days ago - Pushed at: 3 months ago - Stars: 18 - Forks: 9

QuantumByteStudios/GitHubUserDataExtractor

A tool that displays information and received events about any user on GitHub straight on your terminal screen

Language: Python - Size: 52.3 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 18 - Forks: 3

NikhilaThota/CapstoneProject_House_Prices_Prediction

Understand the relationships between various features in relation with the sale price of a house using exploratory data analysis and statistical analysis. Applied ML algorithms such as Multiple Linear Regression, Ridge Regression and Lasso Regression in combination with cross validation. Performed parameter tuning, compared the test scores and suggested a best model to predict the final sale price of a house. Seaborn is used to plot graphs and scikit learn package is used for statistical analysis.

Language: Jupyter Notebook - Size: 7.91 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 18 - Forks: 13

arkutils/arkutils-website

The source for the arkutils website, home of a few Ark: Survival Evolved tools.

Language: Svelte - Size: 5.04 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 16 - Forks: 3

ppatrzyk/filmweb-export

Eksport danych z serwisu filmweb

Language: Python - Size: 368 KB - Last synced at: 12 days ago - Pushed at: 10 months ago - Stars: 16 - Forks: 2

ROBROICH/SAP_AND_COMMON_DATA_MODEL_DEMO

This demo describes the basic integration between S/4HANA and the Microsoft Common Data Model (Model)

Size: 4.24 MB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 16 - Forks: 2

extralit/extralit

Fast and accurate systemic literature data extraction with LLM assistance

Language: Python - Size: 639 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 15 - Forks: 20

Capevace/data-wizard

Extract Structured Data from PDFs, Word Docs and Images. Embeddable directly into your application, regardless of the stack.

Language: JavaScript - Size: 134 MB - Last synced at: 1 day ago - Pushed at: 4 days ago - Stars: 14 - Forks: 3

robert-mcdermott/ollama-batch-cluster

Large Scale Batch Processing with Ollama

Language: Python - Size: 1.01 MB - Last synced at: 27 days ago - Pushed at: 6 months ago - Stars: 14 - Forks: 3

webmiddle/webmiddle

Node.js framework for modular web scraping and data extraction

Language: JavaScript - Size: 2.53 MB - Last synced at: 3 days ago - Pushed at: over 2 years ago - Stars: 14 - Forks: 2

floriancochard/extract-data-from-paper

A tool designed to extract numerical data from scanned historical weather documents.

Language: Python - Size: 151 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 13 - Forks: 2

sypht-team/sypht-node-client

A Nodejs client for the Sypht API

Language: JavaScript - Size: 62.5 KB - Last synced at: 28 days ago - Pushed at: about 2 years ago - Stars: 13 - Forks: 4

attogram/justrefs

Just Refs - extract just the references and related topics from any page on the English Wikipedia

Language: PHP - Size: 244 KB - Last synced at: 27 days ago - Pushed at: almost 5 years ago - Stars: 13 - Forks: 0

xingbow/SciDaEx

Data Extraction and Structuring Demo

Language: Python - Size: 1.09 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 12 - Forks: 0

MOUHASSINE-badreddine/MoroccanHousing-ETL

Moroccan housing data pipeline using scrapy, mongodb , zyte and digitalocean cloud

Language: Python - Size: 30.3 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 12 - Forks: 0

Jacobvs/ML-Music-Analyzer

This repository uses deep learning to determine real-time chords, bpm, and extract other features from music audio

Language: Python - Size: 63.2 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 12 - Forks: 1

StabRise/ScaleDP

ScaleDP is an Open-Source extension of Apache Spark for Document Processing

Language: Python - Size: 7.88 MB - Last synced at: about 18 hours ago - Pushed at: about 2 months ago - Stars: 11 - Forks: 0

webtap-ai/webtap

AI web scraping python library for efficient and reliable web scraping.

Size: 31.4 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 11 - Forks: 1

fabioms-br/azure-data-factory

Aprender Gerencimento de Dados ETL/ELT

Size: 80.1 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 11 - Forks: 1

sp1thas/book-depository-dataset πŸ“¦

A large collection of books, scraped from bookdepository.com

Language: Python - Size: 99.1 MB - Last synced at: 5 days ago - Pushed at: almost 2 years ago - Stars: 11 - Forks: 1

sypht-team/sypht-kotlin-client

A Kotlin client for the Sypht API

Language: Kotlin - Size: 136 KB - Last synced at: 28 days ago - Pushed at: about 2 years ago - Stars: 11 - Forks: 2

petrpatek/airbnb-scraper

Apify public actor for scraping Airbnb homes.

Language: JavaScript - Size: 761 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 11 - Forks: 6

dangvansam/detect-extract-table

Detect and Extract Table On Image (OpenCV)

Language: Python - Size: 610 KB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 11 - Forks: 2

crispyzingy/PDFExcelWordParser

:rocket:Parse PDFs, Word and Excel documents. Read, Create, Merge/Combine, Extract data from office documents.

Language: Python - Size: 514 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 11 - Forks: 8

rubypoddar/GitHub-User-Data-Fetcher

GitHub User Data Fetcher: A tool that extracts and analyzes comprehensive data from GitHub user profiles, including repositories, followers, and activity metrics, to provide actionable insights for recruiters, project managers, and developers.

Language: Python - Size: 17.6 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 10 - Forks: 0

fuchsia-programming/scrape πŸ“¦

When you need those jobs hypersonic πŸš€ scrape πŸ”ͺ

Language: JavaScript - Size: 2.79 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 10 - Forks: 3

pim97/scrappey.js

Scrappey.js: A versatile JavaScript wrapper for Scrappey API for solving Cloudflare, datadome, enabling seamless web scraping of anti-bot protected websites. Simplify data extraction with robust functionality and reliable results. Unlock valuable insights effortlessly. Get started with Scrappey

Language: JavaScript - Size: 124 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 9 - Forks: 4

Related Topics
python 234 web-scraping 146 automation 68 data-analysis 66 data-science 53 pandas 49 data-visualization 47 machine-learning 45 data-mining 45 beautifulsoup 42 python3 39 scraper 37 selenium 37 data 34 webscraping 34 data-cleaning 32 llm 30 data-engineering 28 api 27 data-processing 26 data-transformation 26 csv 26 web-scraper 25 nlp 24 ai 23 javascript 22 scraping 22 ocr 21 data-scraping 21 sql 21 data-exploration 20 pdf 20 json 19 open-source 18 requests 18 etl 17 crawler 16 html 16 streamlit 15 nodejs 15 web-crawler 15 beautifulsoup4 15 blockchain-analysis 14 natural-language-processing 14 excel 14 data-parser 13 data-recovery 13 extract 13 image-processing 13 digital-forensics 13 database 13 scrapy 13 btc-data-analysis 12 btc-analysis 12 web-scraping-python 12 pdf-parser 12 numpy 12 blockchain-tool 12 blockchain-security 12 bcparser 12 blockchain-parser 12 bitcoin-tool 12 invoice 12 data-preprocessing 12 btc-security 12 crypto-analysis 12 crypto-analysis-tool 12 wallet-tool 12 digital-wallet-tool 12 crypto-parser 12 crypto-tool 12 cryptocurrency-parser 12 web-crawling 11 sentiment-analysis 11 selenium-webdriver 11 cli 11 invoice-parser 11 api-client 11 web-automation 10 java 10 text-extraction 10 data-manipulation 10 etl-pipeline 10 typescript 10 text-mining 10 sypht-api 9 css-selector 9 receipt-scanning 9 html-parsing 9 jupyter-notebook 9 extract-fields 9 structured-data 9 receipt-capture 9 sypht 9 openai 9 receipt-scanner 9 r 9 receipt-reader 9 document-capture 9 postgresql 8