GitHub topics: crawling

Repositories

alejandrov44/free-games-alerts

🔔 Quick and easy way to get notified from all kind of new free games available from different platforms to claim.

Language: TypeScript - Size: 131 KB - Last synced at: about 18 hours ago - Pushed at: about 20 hours ago - Stars: 0 - Forks: 0

commoncrawl/web-languages

Crowd-sourced lists of urls to help Common Crawl crawl under-resourced languages. See https://github.com/commoncrawl/web-languages-code/ for the code

Size: 2.05 MB - Last synced at: about 19 hours ago - Pushed at: about 21 hours ago - Stars: 59 - Forks: 77

StudyTab/Phantom-Crawler

🕵️♂️ Perform robust web security scanning and reconnaissance with PhantomCrawler, designed for researchers and pen testers to enhance application security.

Language: Python - Size: 1.31 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

amienbou121/crawl4ai-mcp-server

🕷️ Enable AI agents to scrape and crawl the web effortlessly with this lightweight Model Context Protocol server, integrating seamlessly into your workflows.

Language: Python - Size: 1.34 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 0

A lightweight web application for recording and displaying daily lunch specials for restaurants and butcher shops. Built in Go and using templ with Tailwind CSS, it provides simple management and an attractive presentation of daily menus.

Language: Go - Size: 5.77 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2 - Forks: 1

mts700/YT-Thumbnail-Downloader

📷 Download high-quality YouTube video and Short thumbnails effortlessly in multiple resolutions with this free tool.

Language: JavaScript - Size: 2.28 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

sonoraa4ever/playwright-ai-automation

Language: TypeScript - Size: 1.33 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

vishal9431/Searchin_v1

Seach system

Language: TypeScript - Size: 402 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

ArchiveBox/abx-dl

⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/subtitles, PDFs, screenshots, article text, git repos, and more...

Language: JavaScript - Size: 185 KB - Last synced at: 1 day ago - Pushed at: 2 months ago - Stars: 87 - Forks: 4

MadeWithBlasted/learning-center

📚 Manage learning resources effectively with the Learning Center application, featuring modular design, multi-language support, and robust navigation.

Language: TypeScript - Size: 1.43 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

Kkkkk8S/crawler-scripts

crawler-scripts are a collection of lightweight scripts designed to automate web data extraction. These scripts support various websites and allow users to gather information efficiently without manual effort.

Language: Python - Size: 1.76 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

MarshalX/telegram-crawler

🕷 Automatically detect changes made to the official Telegram sites, clients and servers.

Language: Python - Size: 835 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 331 - Forks: 43

pzaino/thecrowler

A Content Discovery and Development Platform. Empowering Cybersecurity, AI, Marketing, and Finance professionals and researchers to discover, analyze, and interact with the web in all its dimensions.

Language: Go - Size: 38.3 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 49 - Forks: 10

scrapfly/scrapfly-scrapers

Scalable Python web scraping scripts for +40 popular domains

Language: Python - Size: 6.6 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 732 - Forks: 158

D4Vinci/Scrapling

🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

Language: Python - Size: 3.94 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 8,054 - Forks: 458

NickG1978/awesome-web-crawler

🕷️ Discover and use popular web crawlers across various programming languages to efficiently extract data from the web.

Language: HTML - Size: 1.66 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

javi-aranda/malaga-parking-data

Histórico de datos sobre aparcamientos públicos de Málaga (Andalucía, España).

Language: Python - Size: 47.2 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 2 - Forks: 0

justoneapi/crawl-data-api

justoneapi Data API Services. We provide APIs for: Xiaohongshu, Red, Redbook, Rednote, Taobao, JD.com, Douyin (E-commerce), Douyin (Videos), Kuaishou, Pugongying, Xingtu, WeChat Official Accounts, Dianping, Bilibili, Zhihu, Weibo, Beike, Bigo, Temu, Lazada, SHEIN、Shopee, Baidu Index, Boss Zhipin, Zhaopin, Lagou, Toutiao, Facebook

Size: 5.03 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 45 - Forks: 5

ceyhuncakir/opencrawl

An open source crawler project where one can crawl the internet and use open-source LLMs to transform the information to their needs

Language: Python - Size: 1.34 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

krtk-dev/billboard-player

🎹 Free billboard hot 100 M/V streaming service

Language: TypeScript - Size: 3 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 27 - Forks: 9

KoreanThinker/billboard-json

🎧 Get json type billboard hot 100 chart

Language: TypeScript - Size: 3.63 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 53 - Forks: 6

QLangstaff/qrawl

Composable web crawling tools for Rust

Language: Rust - Size: 296 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

LillySchramm/Booklify.me

Booklify.me is an open-source platform for keeping track of everything in your bookshelf.

Language: TypeScript - Size: 35.1 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 4 - Forks: 1

supacrawler/supacrawler

Supacrawler's ultralight engine for scraping and crawling the web. Written in go for maximum performance and concurrency.

Language: Go - Size: 21.4 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 46 - Forks: 3

apify/crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

Language: Python - Size: 32.3 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 7,064 - Forks: 508

seantomburke/sitemapper

Parse through any sitemap in Node.js

Language: TypeScript - Size: 1.56 MB - Last synced at: 3 days ago - Pushed at: 3 months ago - Stars: 124 - Forks: 78

SimFin/pdf-crawler

SimFin's open source PDF crawler

Language: Python - Size: 40 KB - Last synced at: 5 days ago - Pushed at: about 6 years ago - Stars: 127 - Forks: 44

jens-ox/oda

Extraction, versioning and machine-readable provisioning of public data.

Language: TypeScript - Size: 27.9 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 12 - Forks: 0

transitive-bullshit/awesome-puppeteer

A curated list of awesome puppeteer resources.

Size: 105 KB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 2,523 - Forks: 160

hardkoded/puppeteer-sharp

Headless Chrome .NET API

Language: C# - Size: 8.47 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 3,782 - Forks: 478

scrapy/scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

Language: Python - Size: 27.6 MB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 58,754 - Forks: 11,131

apache/nutch

Apache Nutch is an extensible and scalable web crawler

Language: Java - Size: 132 MB - Last synced at: 5 days ago - Pushed at: 18 days ago - Stars: 3,080 - Forks: 1,260

clemfromspace/scrapy-selenium

Scrapy middleware to handle javascript pages using selenium

Language: Python - Size: 29.3 KB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 954 - Forks: 360

cikay/sorjin_base_manual_data_collector

Recursive crawler for most popular Kurdish websites

Language: Python - Size: 64.5 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

webrecorder/browsertrix-crawler

Run a high-fidelity browser-based web archiving crawler in a single Docker container

Language: TypeScript - Size: 53.5 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 894 - Forks: 118

HazemAkram/WebCrawler

AI Web Crawler is a powerful, AI-powered web crawler that extracts product information from e-commerce websites and downloads associated PDF documents. Built with modern Python technologies and featuring intelligent pagination handling, duplicate detection, and advanced PDF processing.

Language: Python - Size: 68.3 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 2 - Forks: 1

apify/crawlee

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

Language: TypeScript - Size: 144 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 20,257 - Forks: 1,054

J4GL/bt-dht

A bittorrent dht scraper

Language: Python - Size: 76.2 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

gocolly/colly

Elegant Scraper and Crawler Framework for Golang

Language: Go - Size: 8.26 MB - Last synced at: 9 days ago - Pushed at: 12 days ago - Stars: 24,748 - Forks: 1,835

ai-robots-txt/ai.robots.txt

A list of AI agents and robots to block.

Language: Python - Size: 508 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 3,177 - Forks: 129

google/corpuscrawler

Crawler for linguistic corpora

Language: Python - Size: 488 KB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 209 - Forks: 54

lumpinif/deepcrawl

100% free and full open-source edge Firecrawl alternative with better links extraction for agents - that you can deploy by yourself.

Language: TypeScript - Size: 5.77 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 3 - Forks: 1

zhuyingda/webster

a reliable high-level web crawling & scraping framework for Node.js.

Language: JavaScript - Size: 181 KB - Last synced at: 4 days ago - Pushed at: 8 months ago - Stars: 549 - Forks: 52

bluet/proxybroker2 Fork of constverum/ProxyBroker

The New (auto rotate) Proxy [Finder | Checker | Server]. HTTP(S) & SOCKS :performing_arts:

Language: Python - Size: 8.22 MB - Last synced at: 7 days ago - Pushed at: 4 months ago - Stars: 903 - Forks: 129

yujiosaka/headless-chrome-crawler

Distributed crawler powered by Headless Chrome

Language: JavaScript - Size: 1.53 MB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 5,622 - Forks: 409

edoardottt/cariddi

Take a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more

Language: Go - Size: 576 KB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 2,822 - Forks: 255

javapuppteernodejs/bypass-awswaf-crawl4ai

Bypass AWS WAF with Crawl4AI & CapSolver: A personal developer's guide to seamless web scraping on WAF-protected sites, featuring API and browser extension integration examples.

Language: Python - Size: 14.6 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

lorien/awesome-web-scraping

List of libraries, tools and APIs for web scraping and data processing.

Language: Makefile - Size: 427 KB - Last synced at: 12 days ago - Pushed at: 20 days ago - Stars: 7,384 - Forks: 825

capjamesg/getsitemap

A Python library that retrieves all URLs in the sitemaps on a website.

Language: Python - Size: 62.5 KB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 1

rebrowser/rebrowser-patches

Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.

Language: JavaScript - Size: 79.1 KB - Last synced at: 13 days ago - Pushed at: 6 months ago - Stars: 1,072 - Forks: 58

tryAGI/Firecrawl

Generated C# SDK based on official Firecrawl OpenAPI specification

Language: C# - Size: 489 KB - Last synced at: 2 days ago - Pushed at: 13 days ago - Stars: 3 - Forks: 1

milos85vasic/Catalogizer

Advanced Multi-Protocol Media Collection Management System

Language: HTML - Size: 137 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

eliasdabbas/langchain-advertools

LangChain integration for advertools

Language: Jupyter Notebook - Size: 1.61 MB - Last synced at: 6 days ago - Pushed at: 5 months ago - Stars: 4 - Forks: 0

roach-php/core

The complete web scraping toolkit for PHP.

Language: PHP - Size: 787 KB - Last synced at: 15 days ago - Pushed at: 27 days ago - Stars: 1,429 - Forks: 77

2jang/DBTI

Python 기반 내 성향에 맞는 강아지 찾기 & 반려견 성향 분석 서비스

Language: HTML - Size: 9.05 MB - Last synced at: 16 days ago - Pushed at: 18 days ago - Stars: 1 - Forks: 0

NateScarlet/holiday-cn

📅🇨🇳中国法定节假日数据自动每日抓取国务院公告

Language: Python - Size: 302 KB - Last synced at: 16 days ago - Pushed at: 18 days ago - Stars: 1,636 - Forks: 178

bitmakerla/estela-entrypoint

estela entrypoint for job runner 🕸

Language: Python - Size: 96.7 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 5 - Forks: 2

NationalLibraryOfNorway/maalfrid_toolkit

Toolkit for the Målfrid project

Language: Python - Size: 1.26 MB - Last synced at: 15 days ago - Pushed at: 18 days ago - Stars: 2 - Forks: 1

ZLotusRain/tider

A fast, simple, extensible and powerful framework for web crawling.

Language: Python - Size: 505 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 5 - Forks: 0

Gaeduck-0908/boannews-crawling-output

boannews-crawling-output

Size: 1.77 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

mmuyakwa/Amazon_Check

An Amazon price tracker written in python. This Skript was written by Webklex, but I added a MySQL-Database and Config-file to it.

Language: Python - Size: 191 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 3 - Forks: 0

DevanshRaghav75/FALL

A automated penetration testing tool

Language: Python - Size: 1.85 MB - Last synced at: 6 days ago - Pushed at: over 4 years ago - Stars: 4 - Forks: 0

apache/nutch-webapp

Apache Nutch is an extensible and scalable web crawler

Language: Java - Size: 124 KB - Last synced at: 5 days ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 5

adbar/courlan

Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters

Language: Python - Size: 591 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 149 - Forks: 9

forkonlp/N2H4

네이버 뉴스 수집을 위한 도구

Language: R - Size: 6.27 MB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 218 - Forks: 75

lorey/mlscraper

🤖 Scrape data from HTML websites automatically by just providing examples

Language: Python - Size: 452 KB - Last synced at: 20 days ago - Pushed at: over 1 year ago - Stars: 1,363 - Forks: 91

XORbit01/webpalm

🕸️ Crawl in the web network

Language: Go - Size: 5.07 MB - Last synced at: about 10 hours ago - Pushed at: 8 months ago - Stars: 378 - Forks: 38

mawrkus/jason-the-miner

⛏ A versatile Web scraper for Node.js

Language: JavaScript - Size: 2.76 MB - Last synced at: 10 days ago - Pushed at: 26 days ago - Stars: 46 - Forks: 11

josephlimtech/linkedin-profile-scraper-api

🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON.

Language: TypeScript - Size: 10.8 MB - Last synced at: 23 days ago - Pushed at: over 1 year ago - Stars: 699 - Forks: 172

MontFerret/ferret

Declarative web scraping

Language: Go - Size: 4.98 MB - Last synced at: 25 days ago - Pushed at: about 2 months ago - Stars: 5,874 - Forks: 309

DBeath/feedsearch-crawler

Crawl sites for RSS, Atom, and JSON feeds.

Language: Python - Size: 856 KB - Last synced at: 5 days ago - Pushed at: 13 days ago - Stars: 81 - Forks: 12

onurkanbakirci/rsl-editor

The open content licensing editor for the AI-first Internet. Easily create, edit, and manage your RSL (Really Simple Licensing) documents.

Language: TypeScript - Size: 4.04 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 15 - Forks: 0

luminati-io/Awesome-Web-Scraping

A list of libraries, tools, and APIs for web scraping and data processing. Find everything you need for extracting, managing, and processing data from the web, from HTTP libraries to browser automation tools and proxy services.

Size: 104 KB - Last synced at: 1 day ago - Pushed at: 7 months ago - Stars: 7 - Forks: 2

ArchiveBox/abx-spec-behaviors

🧩 Proposal to allow user scripts like "expand comments", "hide popups", "fill out this form", etc. to be reusable across pure browser environments, puppeteer, playwright, extensions, AI tools, and many other contexts with minimal adjustment.

Language: JavaScript - Size: 238 KB - Last synced at: 1 day ago - Pushed at: 4 months ago - Stars: 18 - Forks: 0

infinilabs/crawler

🕷️ An easy-to-use spider written in Golang. (previous named GOPA.)

Language: Go - Size: 54.6 MB - Last synced at: 17 days ago - Pushed at: over 4 years ago - Stars: 309 - Forks: 82

omkarcloud/botasaurus-starter

🚀 OFFICIAL STARTER TEMPLATE FOR BOTASAURUS SCRAPING FRAMEWORK 🤖

Language: TypeScript - Size: 402 KB - Last synced at: 28 days ago - Pushed at: 4 months ago - Stars: 27 - Forks: 9

leewr9/crawlquest

Smart crawling request utility for Python.

Language: Python - Size: 27.3 KB - Last synced at: 14 days ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

jroakes/tech-seo-crawler

Build a small, 3 domain internet using Github pages and Wikipedia and construct a crawler to crawl, render, and index.

Language: Python - Size: 6.18 MB - Last synced at: 20 days ago - Pushed at: over 2 years ago - Stars: 73 - Forks: 11

Haimonmon/snippy

A Book scraping bot that ables to give you books data, but be cautious as may result this a banning of your ip.

Language: Python - Size: 429 KB - Last synced at: 26 days ago - Pushed at: 30 days ago - Stars: 0 - Forks: 0

Agenty/scrapingai

Build web scraping agents using AI to auto-extract the data from websites, capture screenshot, generate pdf from URL and web crawling with Agenty

Language: TypeScript - Size: 209 KB - Last synced at: 13 days ago - Pushed at: almost 2 years ago - Stars: 21 - Forks: 3

18520339/facebook-data-extraction

Experience for effectively fetching Facebook data by Querying Graph API with Account-based Token and Operating undetectable scraping Bots to extract Client/Server-side Rendered content

Language: Python - Size: 27 MB - Last synced at: 28 days ago - Pushed at: over 1 year ago - Stars: 209 - Forks: 61

apify/rag-web-browser

RAG Web Browser is an Apify Actor to feed your LLM applications and RAG pipelines with up-to-date text content scraped from the web.

Language: TypeScript - Size: 875 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 62 - Forks: 10

omkarcloud/omkar-temp-mail

🚀 OMKAR TEMP MAIL HELPS YOU USE TEMPORARY EMAILS. 🤖

Language: Python - Size: 15.6 KB - Last synced at: 28 days ago - Pushed at: over 1 year ago - Stars: 14 - Forks: 4

codelucas/newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

Language: HTML - Size: 17.5 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 14,790 - Forks: 2,130

go-rod/rod

A Chrome DevTools Protocol driver for web automation and scraping.

Language: Go - Size: 3.61 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 6,285 - Forks: 414

RaedAddala/Scraping-IMDB

This Python script extracts comprehensive movie data from IMDB, focusing on top-grossing movies from 1920 to 2025. The scraper collects detailed information including box office performance, cast & crew, awards, and other key metrics.

Language: Jupyter Notebook - Size: 110 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 5 - Forks: 3

scrapinghub/spidermon

Scrapy Extension for monitoring spiders execution.

Language: Python - Size: 1.34 MB - Last synced at: 11 days ago - Pushed at: 7 months ago - Stars: 546 - Forks: 100

charliehuynhorz/Crawl-weather-data-in-cities-and-provinces-of-Vietnam

This project provides a simple Python script that crawls current weather data from Thời tiết 24h for all 63 provinces and cities of Vietnam. The data includes temperature, humidity, UV index, and rain chance, and is automatically saved into a CSV file for further analysis or visualization.

Language: Python - Size: 8.79 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

maxcountryman/warc-parquet

🗄️ A simple CLI for converting WARC to Parquet.

Language: Rust - Size: 106 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 113 - Forks: 1

lorien/grab

Web Scraping Framework

Language: Python - Size: 9.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2,416 - Forks: 275

scrapinghub/scrapyrt

HTTP API for Scrapy spiders

Language: Python - Size: 251 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 870 - Forks: 162

OzelTam/OnionCrawler

Tool to crawl .onion websites. Console & Web UI

Language: C# - Size: 7.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

ivan-sincek/scrapy-scraper

Web crawler and scraper based on Scrapy and Playwright's headless browser.

Language: Python - Size: 90.8 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 17 - Forks: 5

ApaxPhoenix/CrawlPy

A efficient web crawler in Python with customizable rules and dynamic content handling for easy data extraction.

Language: Python - Size: 322 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

delvelabs/htcap Fork of fcavallarin/htcap

htcap is a web application scanner able to crawl single page application (SPA) in a recursive manner by intercepting ajax calls and DOM changes.

Language: Python - Size: 527 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 18 - Forks: 4

Related Keywords

crawling 1,139 crawler 361 python 324 scraping 263 scraper 143 python3 85 scrapy 78 web-scraping 77 selenium 72 spider 60 nodejs 57 web 52 scraping-websites 49 web-crawler 48 javascript 47 automation 47 crawling-python 46 web-crawling 43 beautifulsoup 42 puppeteer 41 java 41 webscraping 40 crawlers 36 crawl 35 golang 34 requests 32 php 32 data 31 data-mining 31 api 29 machine-learning 27 beautifulsoup4 27 playwright 26 search-engine 26 typescript 24 nlp 24 flask 23 parsing 22 twitter 22 data-science 22 webcrawler 21 django 20 hacktoberfest 20 bot 20 docker 19 proxy 19 mongodb 19 ai 19 scraping-python 19 scrapping 18 crawling-framework 18 framework 18 information-retrieval 17 webcrawling 17 security 17 pandas 17 web-scraper 17 headless-chrome 17 data-analysis 16 headless 16 seo 16 go 15 crawling-sites 15 react 15 selenium-python 15 crawling-tool 14 indexing 14 elasticsearch 14 news 14 llm 14 html 14 youtube 13 scraping-framework 13 deep-learning 13 crawler-python 13 database 13 node 13 jsoup 12 mysql 12 scrapy-crawler 12 downloader 12 parser 12 scrape 11 proxy-server 11 json 11 dataset 11 tor 11 express 11 bs4 11 wikipedia 11 osint 11 selenium-webdriver 11 proxies 10 asyncio 10 cli 10 chrome 10 clustering 10 scrapers 10 natural-language-processing 10 twitter-api 9