GitHub topics: crawling
alejandrov44/free-games-alerts
🔔 Quick and easy way to get notified from all kind of new free games available from different platforms to claim.
Language: TypeScript - Size: 131 KB - Last synced at: about 18 hours ago - Pushed at: about 20 hours ago - Stars: 0 - Forks: 0
commoncrawl/web-languages
Crowd-sourced lists of urls to help Common Crawl crawl under-resourced languages. See https://github.com/commoncrawl/web-languages-code/ for the code
Size: 2.05 MB - Last synced at: about 19 hours ago - Pushed at: about 21 hours ago - Stars: 59 - Forks: 77
StudyTab/Phantom-Crawler
🕵️♂️ Perform robust web security scanning and reconnaissance with PhantomCrawler, designed for researchers and pen testers to enhance application security.
Language: Python - Size: 1.31 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0
amienbou121/crawl4ai-mcp-server
🕷️ Enable AI agents to scrape and crawl the web effortlessly with this lightweight Model Context Protocol server, integrating seamlessly into your workflows.
Language: Python - Size: 1.34 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 0
flohoss/mittagskarte
A lightweight web application for recording and displaying daily lunch specials for restaurants and butcher shops. Built in Go and using templ with Tailwind CSS, it provides simple management and an attractive presentation of daily menus.
Language: Go - Size: 5.77 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2 - Forks: 1
mts700/YT-Thumbnail-Downloader
📷 Download high-quality YouTube video and Short thumbnails effortlessly in multiple resolutions with this free tool.
Language: JavaScript - Size: 2.28 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0
sonoraa4ever/playwright-ai-automation
Language: TypeScript - Size: 1.33 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0
vishal9431/Searchin_v1
Seach system
Language: TypeScript - Size: 402 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0
ArchiveBox/abx-dl
⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/subtitles, PDFs, screenshots, article text, git repos, and more...
Language: JavaScript - Size: 185 KB - Last synced at: 1 day ago - Pushed at: 2 months ago - Stars: 87 - Forks: 4
MadeWithBlasted/learning-center
📚 Manage learning resources effectively with the Learning Center application, featuring modular design, multi-language support, and robust navigation.
Language: TypeScript - Size: 1.43 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0
Kkkkk8S/crawler-scripts
crawler-scripts are a collection of lightweight scripts designed to automate web data extraction. These scripts support various websites and allow users to gather information efficiently without manual effort.
Language: Python - Size: 1.76 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0
MarshalX/telegram-crawler
🕷 Automatically detect changes made to the official Telegram sites, clients and servers.
Language: Python - Size: 835 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 331 - Forks: 43
pzaino/thecrowler
A Content Discovery and Development Platform. Empowering Cybersecurity, AI, Marketing, and Finance professionals and researchers to discover, analyze, and interact with the web in all its dimensions.
Language: Go - Size: 38.3 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 49 - Forks: 10
scrapfly/scrapfly-scrapers
Scalable Python web scraping scripts for +40 popular domains
Language: Python - Size: 6.6 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 732 - Forks: 158
D4Vinci/Scrapling
🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!
Language: Python - Size: 3.94 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 8,054 - Forks: 458
NickG1978/awesome-web-crawler
🕷️ Discover and use popular web crawlers across various programming languages to efficiently extract data from the web.
Language: HTML - Size: 1.66 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0
javi-aranda/malaga-parking-data
Histórico de datos sobre aparcamientos públicos de Málaga (Andalucía, España).
Language: Python - Size: 47.2 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 2 - Forks: 0
justoneapi/crawl-data-api
justoneapi Data API Services. We provide APIs for: Xiaohongshu, Red, Redbook, Rednote, Taobao, JD.com, Douyin (E-commerce), Douyin (Videos), Kuaishou, Pugongying, Xingtu, WeChat Official Accounts, Dianping, Bilibili, Zhihu, Weibo, Beike, Bigo, Temu, Lazada, SHEIN、Shopee, Baidu Index, Boss Zhipin, Zhaopin, Lagou, Toutiao, Facebook
Size: 5.03 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 45 - Forks: 5
ceyhuncakir/opencrawl
An open source crawler project where one can crawl the internet and use open-source LLMs to transform the information to their needs
Language: Python - Size: 1.34 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0
krtk-dev/billboard-player
🎹 Free billboard hot 100 M/V streaming service
Language: TypeScript - Size: 3 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 27 - Forks: 9
KoreanThinker/billboard-json
🎧 Get json type billboard hot 100 chart
Language: TypeScript - Size: 3.63 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 53 - Forks: 6
QLangstaff/qrawl
Composable web crawling tools for Rust
Language: Rust - Size: 296 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0
LillySchramm/Booklify.me
Booklify.me is an open-source platform for keeping track of everything in your bookshelf.
Language: TypeScript - Size: 35.1 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 4 - Forks: 1
supacrawler/supacrawler
Supacrawler's ultralight engine for scraping and crawling the web. Written in go for maximum performance and concurrency.
Language: Go - Size: 21.4 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 46 - Forks: 3
apify/crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
Language: Python - Size: 32.3 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 7,064 - Forks: 508
seantomburke/sitemapper
Parse through any sitemap in Node.js
Language: TypeScript - Size: 1.56 MB - Last synced at: 3 days ago - Pushed at: 3 months ago - Stars: 124 - Forks: 78
SimFin/pdf-crawler
SimFin's open source PDF crawler
Language: Python - Size: 40 KB - Last synced at: 5 days ago - Pushed at: about 6 years ago - Stars: 127 - Forks: 44
jens-ox/oda
Extraction, versioning and machine-readable provisioning of public data.
Language: TypeScript - Size: 27.9 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 12 - Forks: 0
transitive-bullshit/awesome-puppeteer
A curated list of awesome puppeteer resources.
Size: 105 KB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 2,523 - Forks: 160
hardkoded/puppeteer-sharp
Headless Chrome .NET API
Language: C# - Size: 8.47 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 3,782 - Forks: 478
scrapy/scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
Language: Python - Size: 27.6 MB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 58,754 - Forks: 11,131
apache/nutch
Apache Nutch is an extensible and scalable web crawler
Language: Java - Size: 132 MB - Last synced at: 5 days ago - Pushed at: 18 days ago - Stars: 3,080 - Forks: 1,260
clemfromspace/scrapy-selenium
Scrapy middleware to handle javascript pages using selenium
Language: Python - Size: 29.3 KB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 954 - Forks: 360
cikay/sorjin_base_manual_data_collector
Recursive crawler for most popular Kurdish websites
Language: Python - Size: 64.5 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0
webrecorder/browsertrix-crawler
Run a high-fidelity browser-based web archiving crawler in a single Docker container
Language: TypeScript - Size: 53.5 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 894 - Forks: 118
HazemAkram/WebCrawler
AI Web Crawler is a powerful, AI-powered web crawler that extracts product information from e-commerce websites and downloads associated PDF documents. Built with modern Python technologies and featuring intelligent pagination handling, duplicate detection, and advanced PDF processing.
Language: Python - Size: 68.3 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 2 - Forks: 1
apify/crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
Language: TypeScript - Size: 144 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 20,257 - Forks: 1,054
J4GL/bt-dht
A bittorrent dht scraper
Language: Python - Size: 76.2 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0
gocolly/colly
Elegant Scraper and Crawler Framework for Golang
Language: Go - Size: 8.26 MB - Last synced at: 9 days ago - Pushed at: 12 days ago - Stars: 24,748 - Forks: 1,835
ai-robots-txt/ai.robots.txt
A list of AI agents and robots to block.
Language: Python - Size: 508 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 3,177 - Forks: 129
google/corpuscrawler
Crawler for linguistic corpora
Language: Python - Size: 488 KB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 209 - Forks: 54
lumpinif/deepcrawl
100% free and full open-source edge Firecrawl alternative with better links extraction for agents - that you can deploy by yourself.
Language: TypeScript - Size: 5.77 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 3 - Forks: 1
zhuyingda/webster
a reliable high-level web crawling & scraping framework for Node.js.
Language: JavaScript - Size: 181 KB - Last synced at: 4 days ago - Pushed at: 8 months ago - Stars: 549 - Forks: 52
bluet/proxybroker2 Fork of constverum/ProxyBroker
The New (auto rotate) Proxy [Finder | Checker | Server]. HTTP(S) & SOCKS :performing_arts:
Language: Python - Size: 8.22 MB - Last synced at: 7 days ago - Pushed at: 4 months ago - Stars: 903 - Forks: 129
yujiosaka/headless-chrome-crawler
Distributed crawler powered by Headless Chrome
Language: JavaScript - Size: 1.53 MB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 5,622 - Forks: 409
edoardottt/cariddi
Take a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more
Language: Go - Size: 576 KB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 2,822 - Forks: 255
javapuppteernodejs/bypass-awswaf-crawl4ai
Bypass AWS WAF with Crawl4AI & CapSolver: A personal developer's guide to seamless web scraping on WAF-protected sites, featuring API and browser extension integration examples.
Language: Python - Size: 14.6 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0
lorien/awesome-web-scraping
List of libraries, tools and APIs for web scraping and data processing.
Language: Makefile - Size: 427 KB - Last synced at: 12 days ago - Pushed at: 20 days ago - Stars: 7,384 - Forks: 825
capjamesg/getsitemap
A Python library that retrieves all URLs in the sitemaps on a website.
Language: Python - Size: 62.5 KB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 1
rebrowser/rebrowser-patches
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.
Language: JavaScript - Size: 79.1 KB - Last synced at: 13 days ago - Pushed at: 6 months ago - Stars: 1,072 - Forks: 58
tryAGI/Firecrawl
Generated C# SDK based on official Firecrawl OpenAPI specification
Language: C# - Size: 489 KB - Last synced at: 2 days ago - Pushed at: 13 days ago - Stars: 3 - Forks: 1
milos85vasic/Catalogizer
Advanced Multi-Protocol Media Collection Management System
Language: HTML - Size: 137 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0
eliasdabbas/langchain-advertools
LangChain integration for advertools
Language: Jupyter Notebook - Size: 1.61 MB - Last synced at: 6 days ago - Pushed at: 5 months ago - Stars: 4 - Forks: 0
roach-php/core
The complete web scraping toolkit for PHP.
Language: PHP - Size: 787 KB - Last synced at: 15 days ago - Pushed at: 27 days ago - Stars: 1,429 - Forks: 77
2jang/DBTI
Python 기반 내 성향에 맞는 강아지 찾기 & 반려견 성향 분석 서비스
Language: HTML - Size: 9.05 MB - Last synced at: 16 days ago - Pushed at: 18 days ago - Stars: 1 - Forks: 0
NateScarlet/holiday-cn
📅🇨🇳中国法定节假日数据 自动每日抓取国务院公告
Language: Python - Size: 302 KB - Last synced at: 16 days ago - Pushed at: 18 days ago - Stars: 1,636 - Forks: 178
bitmakerla/estela-entrypoint
estela entrypoint for job runner 🕸
Language: Python - Size: 96.7 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 5 - Forks: 2
NationalLibraryOfNorway/maalfrid_toolkit
Toolkit for the Målfrid project
Language: Python - Size: 1.26 MB - Last synced at: 15 days ago - Pushed at: 18 days ago - Stars: 2 - Forks: 1
ZLotusRain/tider
A fast, simple, extensible and powerful framework for web crawling.
Language: Python - Size: 505 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 5 - Forks: 0
Gaeduck-0908/boannews-crawling-output
boannews-crawling-output
Size: 1.77 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0
mmuyakwa/Amazon_Check
An Amazon price tracker written in python. This Skript was written by Webklex, but I added a MySQL-Database and Config-file to it.
Language: Python - Size: 191 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 3 - Forks: 0
DevanshRaghav75/FALL
A automated penetration testing tool
Language: Python - Size: 1.85 MB - Last synced at: 6 days ago - Pushed at: over 4 years ago - Stars: 4 - Forks: 0
apache/nutch-webapp
Apache Nutch is an extensible and scalable web crawler
Language: Java - Size: 124 KB - Last synced at: 5 days ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 5
adbar/courlan
Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters
Language: Python - Size: 591 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 149 - Forks: 9
forkonlp/N2H4
네이버 뉴스 수집을 위한 도구
Language: R - Size: 6.27 MB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 218 - Forks: 75
lorey/mlscraper
🤖 Scrape data from HTML websites automatically by just providing examples
Language: Python - Size: 452 KB - Last synced at: 20 days ago - Pushed at: over 1 year ago - Stars: 1,363 - Forks: 91
XORbit01/webpalm
🕸️ Crawl in the web network
Language: Go - Size: 5.07 MB - Last synced at: about 10 hours ago - Pushed at: 8 months ago - Stars: 378 - Forks: 38
mawrkus/jason-the-miner
⛏ A versatile Web scraper for Node.js
Language: JavaScript - Size: 2.76 MB - Last synced at: 10 days ago - Pushed at: 26 days ago - Stars: 46 - Forks: 11
josephlimtech/linkedin-profile-scraper-api
🕵️♂️ LinkedIn profile scraper returning structured profile data in JSON.
Language: TypeScript - Size: 10.8 MB - Last synced at: 23 days ago - Pushed at: over 1 year ago - Stars: 699 - Forks: 172
MontFerret/ferret
Declarative web scraping
Language: Go - Size: 4.98 MB - Last synced at: 25 days ago - Pushed at: about 2 months ago - Stars: 5,874 - Forks: 309
DBeath/feedsearch-crawler
Crawl sites for RSS, Atom, and JSON feeds.
Language: Python - Size: 856 KB - Last synced at: 5 days ago - Pushed at: 13 days ago - Stars: 81 - Forks: 12
onurkanbakirci/rsl-editor
The open content licensing editor for the AI-first Internet. Easily create, edit, and manage your RSL (Really Simple Licensing) documents.
Language: TypeScript - Size: 4.04 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 15 - Forks: 0
luminati-io/Awesome-Web-Scraping
A list of libraries, tools, and APIs for web scraping and data processing. Find everything you need for extracting, managing, and processing data from the web, from HTTP libraries to browser automation tools and proxy services.
Size: 104 KB - Last synced at: 1 day ago - Pushed at: 7 months ago - Stars: 7 - Forks: 2
ArchiveBox/abx-spec-behaviors
🧩 Proposal to allow user scripts like "expand comments", "hide popups", "fill out this form", etc. to be reusable across pure browser environments, puppeteer, playwright, extensions, AI tools, and many other contexts with minimal adjustment.
Language: JavaScript - Size: 238 KB - Last synced at: 1 day ago - Pushed at: 4 months ago - Stars: 18 - Forks: 0
infinilabs/crawler
🕷️ An easy-to-use spider written in Golang. (previous named GOPA.)
Language: Go - Size: 54.6 MB - Last synced at: 17 days ago - Pushed at: over 4 years ago - Stars: 309 - Forks: 82
omkarcloud/botasaurus-starter
🚀 OFFICIAL STARTER TEMPLATE FOR BOTASAURUS SCRAPING FRAMEWORK 🤖
Language: TypeScript - Size: 402 KB - Last synced at: 28 days ago - Pushed at: 4 months ago - Stars: 27 - Forks: 9
leewr9/crawlquest
Smart crawling request utility for Python.
Language: Python - Size: 27.3 KB - Last synced at: 14 days ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0
jroakes/tech-seo-crawler
Build a small, 3 domain internet using Github pages and Wikipedia and construct a crawler to crawl, render, and index.
Language: Python - Size: 6.18 MB - Last synced at: 20 days ago - Pushed at: over 2 years ago - Stars: 73 - Forks: 11
Haimonmon/snippy
A Book scraping bot that ables to give you books data, but be cautious as may result this a banning of your ip.
Language: Python - Size: 429 KB - Last synced at: 26 days ago - Pushed at: 30 days ago - Stars: 0 - Forks: 0
Agenty/scrapingai
Build web scraping agents using AI to auto-extract the data from websites, capture screenshot, generate pdf from URL and web crawling with Agenty
Language: TypeScript - Size: 209 KB - Last synced at: 13 days ago - Pushed at: almost 2 years ago - Stars: 21 - Forks: 3
18520339/facebook-data-extraction
Experience for effectively fetching Facebook data by Querying Graph API with Account-based Token and Operating undetectable scraping Bots to extract Client/Server-side Rendered content
Language: Python - Size: 27 MB - Last synced at: 28 days ago - Pushed at: over 1 year ago - Stars: 209 - Forks: 61
apify/rag-web-browser
RAG Web Browser is an Apify Actor to feed your LLM applications and RAG pipelines with up-to-date text content scraped from the web.
Language: TypeScript - Size: 875 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 62 - Forks: 10
omkarcloud/omkar-temp-mail
🚀 OMKAR TEMP MAIL HELPS YOU USE TEMPORARY EMAILS. 🤖
Language: Python - Size: 15.6 KB - Last synced at: 28 days ago - Pushed at: over 1 year ago - Stars: 14 - Forks: 4
codelucas/newspaper
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
Language: HTML - Size: 17.5 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 14,790 - Forks: 2,130
go-rod/rod
A Chrome DevTools Protocol driver for web automation and scraping.
Language: Go - Size: 3.61 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 6,285 - Forks: 414
RaedAddala/Scraping-IMDB
This Python script extracts comprehensive movie data from IMDB, focusing on top-grossing movies from 1920 to 2025. The scraper collects detailed information including box office performance, cast & crew, awards, and other key metrics.
Language: Jupyter Notebook - Size: 110 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 5 - Forks: 3
scrapinghub/spidermon
Scrapy Extension for monitoring spiders execution.
Language: Python - Size: 1.34 MB - Last synced at: 11 days ago - Pushed at: 7 months ago - Stars: 546 - Forks: 100
charliehuynhorz/Crawl-weather-data-in-cities-and-provinces-of-Vietnam
This project provides a simple Python script that crawls current weather data from Thời tiết 24h for all 63 provinces and cities of Vietnam. The data includes temperature, humidity, UV index, and rain chance, and is automatically saved into a CSV file for further analysis or visualization.
Language: Python - Size: 8.79 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
maxcountryman/warc-parquet
🗄️ A simple CLI for converting WARC to Parquet.
Language: Rust - Size: 106 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 113 - Forks: 1
lorien/grab
Web Scraping Framework
Language: Python - Size: 9.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2,416 - Forks: 275
scrapinghub/scrapyrt
HTTP API for Scrapy spiders
Language: Python - Size: 251 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 870 - Forks: 162
OzelTam/OnionCrawler
Tool to crawl .onion websites. Console & Web UI
Language: C# - Size: 7.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0
ivan-sincek/scrapy-scraper
Web crawler and scraper based on Scrapy and Playwright's headless browser.
Language: Python - Size: 90.8 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 17 - Forks: 5
ApaxPhoenix/CrawlPy
A efficient web crawler in Python with customizable rules and dynamic content handling for easy data extraction.
Language: Python - Size: 322 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
delvelabs/htcap Fork of fcavallarin/htcap
htcap is a web application scanner able to crawl single page application (SPA) in a recursive manner by intercepting ajax calls and DOM changes.
Language: Python - Size: 527 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 18 - Forks: 4
ArchiveTeam/wget-lua
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Language: C - Size: 28 MB - Last synced at: 13 days ago - Pushed at: 2 months ago - Stars: 130 - Forks: 17
ilovedevs/awesome-web-crawler
List of best web crawlers to extract data from the web. Find web crawling tools for different needs.
Size: 167 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0
rezamobaraki/goodreads-peewee-python
simple project on command-line | goodreads.com
Language: Python - Size: 19.5 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0
rivermont/spidy
The simple, easy to use command line web crawler.
Language: Python - Size: 81.8 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 352 - Forks: 69
carlosplanchon/spidercreator
Automated web scraping spider generation using Browser Use and LLMs. Streamline the creation of Playwright-based spiders with minimal manual coding. Ideal for large enterprises with recurring data extraction needs.
Language: Python - Size: 6.39 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 90 - Forks: 16