GitHub topics: web-crawler | Ecosyste.ms: Repos

xianhu/PSpider

简单易用的Python爬虫框架，QQ交流群：597510560

Language: Python - Size: 814 KB - Last synced at: 30 days ago - Pushed at: about 3 years ago - Stars: 1,837 - Forks: 501

AmadeusITGroup/CrawlerBox

CrawlerBox is an automated analysis framework designed for parsing emails and crawling embedded web resources.

Language: Python - Size: 146 KB - Last synced at: 25 days ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

BIN-PDT/WEBAPP_INSTAGRAM

AN INSTAGRAM CLONE WEB APPLICATION

Language: HTML - Size: 2.91 MB - Last synced at: 3 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

supadata-ai/js

Official TypeScript/JavaScript SDK for the Supadata API.

Language: TypeScript - Size: 224 KB - Last synced at: 7 days ago - Pushed at: about 2 months ago - Stars: 9 - Forks: 6

QARTER-FR/Tiktok-full-api

Unofficial TikTok Full API for developers and researchers – explore trending videos, user profiles, and hashtags using Python

Language: Python - Size: 5.86 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

duyet/awesome-web-scraper

A collection of awesome web scaper, crawler.

Size: 48.8 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 273 - Forks: 46

Silvmike/simple-rag

Simple RAG: provides services to ingest text data, and search it later using RAG-pipeline

Language: Kotlin - Size: 440 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

brendonboshell/supercrawler

A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.

Language: JavaScript - Size: 664 KB - Last synced at: 25 days ago - Pushed at: over 2 years ago - Stars: 381 - Forks: 61

postmodern/spidr

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Language: Ruby - Size: 685 KB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 818 - Forks: 107

crackalamoo/web-nlp-scraper

A command line tool to quickly run natural language processing (NLP) algorithms on any website. Ideal for understanding the language trends of a blog, or comparing two blogs.

Language: Python - Size: 43 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

BruceDone/awesome-crawler

A collection of awesome web crawler,spider in different languages

Size: 74.2 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 6,744 - Forks: 716

commoncrawl/nutch Fork of Aloisius/nutch

Common Crawl fork of Apache Nutch

Language: Java - Size: 132 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 33 - Forks: 2

omkarcloud/botasaurus-starter

🚀 OFFICIAL STARTER TEMPLATE FOR BOTASAURUS SCRAPING FRAMEWORK 🤖

Language: TypeScript - Size: 397 KB - Last synced at: 2 days ago - Pushed at: about 2 months ago - Stars: 25 - Forks: 9

idkidkidkidkidkidkidkidk/gics-sentry-bot

尋找資安女婕思初賽PaGamO哨兵機器人

Language: Python - Size: 3.03 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 5 - Forks: 1

VIDA-NYU/ache

ACHE is a web crawler for domain-specific search.

Language: Java - Size: 66.6 MB - Last synced at: 28 days ago - Pushed at: almost 2 years ago - Stars: 468 - Forks: 134

alanindra/news-enricher

Extracts news article metadata (title, content, date, journalist, entities) from provided URLs.

Language: Python - Size: 31.3 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

Welcome to this repository! 🎉 Here, you will find a collection of 10 free scrapers for extracting data from various websites. This project aims to help developers, researchers, and web scraping enthusiasts.

Language: Jupyter Notebook - Size: 109 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

heleusbrands/InSite

A lightning fast tool for crawling websites and compiling PDFs of their pages

Language: Python - Size: 24.4 KB - Last synced at: 20 days ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

vientorepublic/pal-crawl

국회입법예고(pal.assembly.go.kr)의 진행 중인 입법 예고 크롤러

Language: TypeScript - Size: 170 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

vientorepublic/melona

멜론 음원 서비스 스크래핑 API

Language: TypeScript - Size: 274 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

RozhakXD/WA-VerifyAPI

🛡️ WhatsApp Shield - Advanced API untuk validasi & forensic analysis link grup WhatsApp ⚡

Language: Python - Size: 152 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 0

ibrahimsql/aether

🛡️ Aether: Revolutionary XSS toolkit combining scanning, smart WAF bypasses, and advanced payload generation. Perfect for modern pentesting and bug bounty hunting.

Language: C# - Size: 198 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 9 - Forks: 1

arvid-berndtsson/robots-txt-analyzer

Modern robots.txt analyzer with instant analysis, security recommendations, and export capabilities. Built with Qwik and deployed on Cloudflare Pages.

Language: TypeScript - Size: 707 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

NYXMatik/Web-Search-and-Information-Retrieval-in-the-Internet-Seminar

Technical seminar exploring the architecture and algorithms behind modern web search engines, including BM25, DPR, and hybrid retrieval models.

Size: 0 Bytes - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

besnoi/pyApps

Some Small yet Useful Python GUI Apps

Language: Python - Size: 10.1 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 6 - Forks: 1

ss0715jj/NewsCrawler

ZDNet Korea 뉴스 헤드라인을 크롤링하고 REST API로 제공합니다. (Flask 기반)

Language: Python - Size: 6.84 KB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

cyclone-github/spider

Spider - web crawler and local wordlist processor to generate frequency sorted wordlist / ngrams

Language: Go - Size: 99.6 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 16 - Forks: 1

ScrapingAnt/zoominfo_scraper

Zoominfo scraper with using of rotating proxies and headless Chrome from ScrapingAnt

Language: Python - Size: 7.81 KB - Last synced at: 10 days ago - Pushed at: about 4 years ago - Stars: 33 - Forks: 9

omkarcloud/selenium-2captcha-recaptcha-solver-demo

🚀 FINAL CODE FOR TUTORIAL ON HOW TO SOLVE CAPTCHA IN SELENIUM USING 2CAPTCHA 🤖

Language: Python - Size: 5.86 KB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 6 - Forks: 2

MaxValue/Terpene-Profile-Parser-for-Cannabis-Strains

Parser and database to index the terpene profile of different strains of Cannabis from online databases

Language: Python - Size: 21.4 MB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 118 - Forks: 18

FlowerEatsFish/books-com-tw-crawler

books.com.tw crawler 「博客來」資料爬蟲

Language: TypeScript - Size: 1.34 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 17 - Forks: 6

LucaAhumada/broken-link-checker

A robust Node.js tool to crawls websites, detect and report broken links. Customize your crawl using a JSON file and generates a detailed HTML report when finish.

Language: JavaScript - Size: 9.77 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

cxcscmu/Craw4LLM

Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"

Language: Python - Size: 79.1 KB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 608 - Forks: 56

thesp0nge/nightcrawler-mitm

A python program that crawls a website and tries to stress it, polluting forms with bogus data

Language: Python - Size: 247 KB - Last synced at: 20 days ago - Pushed at: 2 months ago - Stars: 13 - Forks: 1

lefterisloukas/edgar-crawler

The only open-source toolkit that can download SEC EDGAR financial reports and extract textual data from specific item sections into nice & clean structured JSON files.

Language: Python - Size: 63 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 365 - Forks: 100

renbkna/mikumikucrawler

A real-time web crawler powered by Puppeteer, Cheerio, and Socket.io, featuring a dynamic UI with live stats, animations, and a Miku-inspired theme.

Language: TypeScript - Size: 6.95 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 1 - Forks: 2

eehwan/courtAuctionCrawler

대한민국 대법원 부동산 경매시스템에서 경매 매물 정보를 자동으로 수집하는 Python 기반 크롤러입니다.

Language: Python - Size: 5.86 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

sammwyy/SpearCopy

A universal and local phishing toolkit for audit purposes

Language: Python - Size: 6.84 KB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 18 - Forks: 1

MCStreetguy/Crawler

An advanced web-crawler written in PHP.

Language: PHP - Size: 224 KB - Last synced at: 2 months ago - Pushed at: about 6 years ago - Stars: 5 - Forks: 0

krishna-aditi/nlp-sentiment-analysis-on-stock-news-and-price-monitoring

WebApp to bring together Text Summarization and Sentiment Analysis of the stock related news to better understand the stock price trends.

Language: Jupyter Notebook - Size: 3.07 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 15 - Forks: 6

t-z-scott/cybersecurity-projects

self-study outside of classes / work for practice :)

Language: Jupyter Notebook - Size: 174 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Jatin-Mehra119/CRAWLGPT

A powerful web content crawler with LLM-powered RAG (Retrieval Augmented Generation) capabilities. CrawlGPT extracts content from URLs, processes it through intelligent summarization, and enables natural language interactions using modern LLM technology.

Language: Python - Size: 120 KB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

louisguitton/disqus-crawler

Crawl DISQUS comments from a blog into a local MongoDB database

Language: Python - Size: 38.1 KB - Last synced at: about 2 months ago - Pushed at: over 5 years ago - Stars: 13 - Forks: 1

ahmedshahriar/youtube-comment-scraper

This script will dump youtube video comments to a CSV from youtube video links. Video links can be placed inside a variable or list or CSV

Language: Jupyter Notebook - Size: 256 KB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 42 - Forks: 15

anlaki-py/web-crawler

Web Crawler and GitHub Documentation Crawler

Language: Python - Size: 1.25 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Madi-S/Lead-Generation

Python script, which empowers people with no programming background to generate robust leads on a mass scale. This repo will be compiled of various versatile techniques used in lead generation.

Language: Python - Size: 9.67 MB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 153 - Forks: 38

Viveckh/LilHomie

A Machine Learning Project implemented from scratch which involves web scraping, data engineering, exploratory data analysis and machine learning to predict housing prices in New York Tri-State Area.

Language: Jupyter Notebook - Size: 10.5 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 92 - Forks: 19

Novant8/priv-accept-topics Fork of marty90/priv-accept

A web crawler to detect the usages of Google's Topics API

Language: Jupyter Notebook - Size: 11.3 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

oxylabs/web-crawler

Web Crawler is a tool used to discover target URLs, select the relevant content, and have it delivered in bulk. It crawls websites in real-time and at scale to quickly deliver all content or only the data you need based on your chosen criteria.

Language: Python - Size: 45.9 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 5 - Forks: 2

Pythoript/email-scraper

Scrape emails from a website using recursive crawling, the best anti-obfuscation techniques, and validate all addresses before saving to a file.

Language: Go - Size: 22.5 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 1

FlowerEatsFish/eslite-com-crawler 📦

eslite.com crawler 「誠品線上」資料爬蟲

Language: TypeScript - Size: 1.34 MB - Last synced at: 29 days ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 1

MaxHou-infinity/MD_knowledge_great_again

网页爬虫与Markdown清洗工具

Language: Python - Size: 16.6 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

calebwin/frequent

A utility for crawling websites and building frequency lists of words

Language: Python - Size: 9.77 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 27 - Forks: 12

TimTosi/mcrawler

[Go] - Web Crawler with composable pipeline.

Language: Go - Size: 1.25 MB - Last synced at: 2 days ago - Pushed at: almost 6 years ago - Stars: 3 - Forks: 0

mmycin/npminfo

A CLI tool to check the downloads of an NPM package written in Go

Language: Go - Size: 6.13 MB - Last synced at: 8 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

roshanlam/Spider

Web Crawler built using asynchronous Python and distributed task management that extracts and saves web data for analysis.

Language: Python - Size: 340 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 22 - Forks: 5

Integralist/go-web-crawler

A web crawler built in the Go programming language

Language: Go - Size: 402 KB - Last synced at: 2 months ago - Pushed at: over 6 years ago - Stars: 6 - Forks: 1

ZetoOfficial/domain-scraper

App for parsing and processing internal links on web pages.

Language: Go - Size: 13.7 KB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

miltiadiss/CEID_NE4338-Multidimensional-Data-Structures

This project implements multi-dimensional indices (k-d trees, quad trees, range trees, R-trees) for querying computer scientists' data by surname, awards, and publications, with education similarity measured using LSH, comparing the methods experimentally.

Language: Python - Size: 3.29 MB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

MrCaptain27/LianJiaScraper

这是一个基于Spring Boot框架开发的链家房源数据爬虫系统。本项目致力于为用户提供一个便捷、高效的房源数据采集解决方案。通过自动化爬取链家网站的房源信息，系统能够实时获取各个城市的房源详情，包括房屋价格、位置、面积、户型等关键信息。

Size: 1000 Bytes - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

fxrhan/LinkGuardian

A powerful, asynchronous website crawler and link checker that helps you identify broken links, orphaned pages, and analyze your website's link structure. Built with Python and designed for cross-platform compatibility.

Language: Python - Size: 9.77 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

yields/ant

A web crawler for Go

Language: Go - Size: 168 KB - Last synced at: 29 days ago - Pushed at: 3 months ago - Stars: 278 - Forks: 17

elliotxx/zhihu-crawler-people

A simple distributed crawler for zhihu && data analysis

Language: Python - Size: 183 KB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 192 - Forks: 89

avilum/smart-url-fuzzer

Explore URLs of domains fast and efficiently using fuzzing techniques

Language: Python - Size: 338 KB - Last synced at: 3 months ago - Pushed at: over 4 years ago - Stars: 56 - Forks: 18

supergillis/crawler-ts

Crawler written in TypeScript using ES6 generators.

Language: TypeScript - Size: 60.5 KB - Last synced at: 5 days ago - Pushed at: about 4 years ago - Stars: 12 - Forks: 1

redcode-labs/UnChain

A tool to find redirection chains in multiple URLs

Language: Go - Size: 3.3 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 80 - Forks: 13

SchBenedikt/datamining

Heise (https://heise.de) News Crawler

Language: Python - Size: 3.92 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

sayyid5416/Links-Extractor

Extract links from any file or the website.

Language: Python - Size: 197 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 11 - Forks: 5

SpoofIMEI/GoldDigger

GolDigger web crawler

Language: Go - Size: 44.9 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

minseok0809/robotic-process-automation

File Management, School Automation, Text Automation, Web Crawler, Web Automation, Data Preprocessing, Dataframe Editor

Language: Jupyter Notebook - Size: 14.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 1

omkarcloud/omkar-temp-mail

🚀 OMKAR TEMP MAIL HELPS YOU USE TEMPORARY EMAILS. 🤖

Language: Python - Size: 15.6 KB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 13 - Forks: 4

simonpierreboucher0/Duproprio_crawler_R

This repository contains R scripts for scraping and processing housing data from the DuProprio website. The project includes scripts for extracting property URLs, scraping detailed property information, and performing feature engineering on the gathered data.

Language: R - Size: 12.7 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

anyparser/anyparserjs

Anyparser Typescript SDK for RAG/ETL Pipelines - File Content Extraction. Supports extraction from various file formats including PDF, Microsoft Office documents, OCR/Image to Text, Audio to Text, and Website to Text.

Language: TypeScript - Size: 408 KB - Last synced at: 18 days ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

FelipeTr00/fipe-python

Web Crawler FIPE

Language: Python - Size: 2.63 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

AIAfterDark/AI-URL-Read

A Python-based documentation assistant that uses local LLMs to crawl websites, process content, and provide intelligent Q&A capabilities with source citations.

Language: Python - Size: 29.3 KB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

apify/actor-legacy-phantomjs-crawler

The actor implements the legacy Apify Crawler product. It uses PhantomJS headless browser to recursively crawl websites and extract data from them using a piece of JavaScript code.

Language: JavaScript - Size: 1020 KB - Last synced at: 8 days ago - Pushed at: about 2 years ago - Stars: 7 - Forks: 4

Molizanee/uber-tech-blog-automation

Uber Tech Blog Automation is a web crawler designed to retrieve and process data from Uber's Engineering Blog

Language: TypeScript - Size: 4.87 MB - Last synced at: 13 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

crawlab-team/crawlab-lite

Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台

Language: Vue - Size: 2.36 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 224 - Forks: 75

spk/validate-website

Web crawler for checking the validity of your documents.

Language: HTML - Size: 934 KB - Last synced at: 18 days ago - Pushed at: almost 2 years ago - Stars: 39 - Forks: 9

Cheng-Lin-Li/Market-Trend-Prediction

This is a project of build knowledge graph course. The project leverages historical stock price, and integrates social media listening from customers to predict market Trend On Dow Jones Industrial Average (DJIA).

Language: Julia - Size: 143 MB - Last synced at: about 2 months ago - Pushed at: about 7 years ago - Stars: 58 - Forks: 28

mzubairtahir/Latest-Twitter-Scraper

This python scraper is for latest twitter website structure , that scrapes tweets of an twitter account

Language: Python - Size: 28.3 KB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 0

abo123456789/leek

Distributed task redisqueue(最简单python分布式函数调度框架)

Language: Python - Size: 412 KB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 63 - Forks: 19

spider-rs/spider-cloud-live-code-viewer

Visual web crawler

Language: TypeScript - Size: 559 KB - Last synced at: 2 months ago - Pushed at: 6 months ago - Stars: 2 - Forks: 1

apache/nutch-webapp

Apache Nutch is an extensible and scalable web crawler

Language: Java - Size: 124 KB - Last synced at: 6 days ago - Pushed at: almost 2 years ago - Stars: 7 - Forks: 5

shaikhsajid1111/manga-down

manga_down is a tool to download manga from mangareader and mangapanda

Language: Python - Size: 7.31 MB - Last synced at: 20 days ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 0

heinrichb/scrapey-cli

Scrapey CLI is a lightweight, modular command-line tool built in Go for web crawling and scraping. It allows users to collect and parse HTML data based on customizable configuration files or command-line flags, with plans to support multiple storage options such as JSON, XML, and various databases.

Language: Go - Size: 1.48 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

cy-zheng/pyCreeper

一个用来快速提取网页内容的信息采集（爬虫）框架，实现了对网页的动态加载与控制。

Language: Python - Size: 383 KB - Last synced at: 12 days ago - Pushed at: about 8 years ago - Stars: 30 - Forks: 6

Nuraj250/sitemap-crawler

A simple web-based sitemap crawler built with Node.js, Express, and SimpleCrawler. This tool allows users to input a URL and crawl the website, generating lists of successful and failed URLs. The results can be downloaded for further analysis.

Language: JavaScript - Size: 20.5 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

elky84/lol-crawler

Notification from LOL friend game start & end.

Language: C# - Size: 1.6 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

infinilabs/crawler

🕷️ An easy-to-use spider written in Golang. (previous named GOPA.)

Language: Go - Size: 54.6 MB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 308 - Forks: 82

algorime/AlgoCrawl

AlgoCrawl: Dual-depth web crawler for static & dynamic websites built for penetration testing.

Language: TypeScript - Size: 44.9 KB - Last synced at: 26 days ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

osamikoyo/Geass

web crawler for you, with some api function and configuration

Language: Go - Size: 32.8 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

spk/maman

Rust Web Crawler saving pages on Redis

Language: Rust - Size: 203 KB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 44 - Forks: 5

mazzzystar/Proxy

A simple tool for fetching usable proxies from several websites.

Language: Python - Size: 64.5 KB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 127 - Forks: 68

matthewspangler/selenium-buying-bot

A Selenium bot that makes purchases for you when an item becomes available on an online store, such as Walmart or Amazon.

Language: Python - Size: 25.4 KB - Last synced at: 2 months ago - Pushed at: over 4 years ago - Stars: 11 - Forks: 1

SciFrozen-Git/website-scraper

A powerful and easy-to-use tool built with Scrapy and Node.js that allows you to scrape and download the entire source code and assets of any website. Perfect for developers, researchers, and web enthusiasts who need offline access to websites or want to analyze their structure.

Language: Python - Size: 14.6 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

shenfe/puppeteer-service

🎠 Run headless Chrome (aka Puppeteer) as a service.

Language: JavaScript - Size: 133 KB - Last synced at: 8 days ago - Pushed at: over 7 years ago - Stars: 49 - Forks: 6

kvdomingo/douglas-crawler 📦

Simple script & web app for crawling product pages on douglas.de

Language: Python - Size: 759 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

GoncaloMark/CobWeb-lnx

CobWeb is a Python library for web scraping. The library consists of two classes: Spider and Scraper.

Language: Python - Size: 7.75 MB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 38 - Forks: 2

zpz5HAU-tgc3fgw2xwr/bootdotdev_web-crawler-go

🌐 A CLI-based web crawler written in Go, designed to explore concurrency and efficient link traversal.

Language: Go - Size: 8.79 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0