Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: web-crawler
internetarchive/Zeno
State-of-the-art web crawler 🔱
Language: Go - Size: 661 KB - Last synced: about 4 hours ago - Pushed: about 10 hours ago - Stars: 36 - Forks: 2
luiswirth/crawler
An asynchronous web crawler.
Language: Rust - Size: 442 KB - Last synced: about 11 hours ago - Pushed: 12 months ago - Stars: 3 - Forks: 0
bAndie91/tools
all-in collection of productivity scripts, CLI tools, utility libraries, fuse filesystems, and also some stuff
Language: Shell - Size: 1.02 MB - Last synced: about 15 hours ago - Pushed: about 15 hours ago - Stars: 16 - Forks: 1
mendableai/firecrawl
🔥 Turn entire websites into LLM-ready markdown
Language: TypeScript - Size: 1.08 MB - Last synced: about 15 hours ago - Pushed: about 20 hours ago - Stars: 2,852 - Forks: 236
webrecorder/browsertrix-crawler
Run a high-fidelity browser-based crawler in a single Docker container
Language: TypeScript - Size: 52.5 MB - Last synced: about 17 hours ago - Pushed: about 17 hours ago - Stars: 552 - Forks: 71
BruceDone/awesome-crawler
A collection of awesome web crawler,spider in different languages
Size: 74.2 KB - Last synced: 1 day ago - Pushed: about 1 month ago - Stars: 6,140 - Forks: 675
omkarcloud/botasaurus-starter
🚀 OFFICIAL STARTER TEMPLATE FOR BOTASAURUS SCRAPING FRAMEWORK 🤖
Language: TypeScript - Size: 385 KB - Last synced: about 18 hours ago - Pushed: 1 day ago - Stars: 13 - Forks: 4
industrialsociety/Panoptes.py
A Python web crawler that efficiently manages URLs with SQLite, supports real-time progress updates via keypress, and robustly handles various parsing and request errors.
Language: Python - Size: 19.5 KB - Last synced: 1 day ago - Pushed: 28 days ago - Stars: 0 - Forks: 0
apache/nutch
Apache Nutch is an extensible and scalable web crawler
Language: Java - Size: 131 MB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 2,820 - Forks: 1,249
duyet/awesome-web-scraper
A collection of awesome web scaper, crawler.
Size: 48.8 KB - Last synced: about 8 hours ago - Pushed: about 1 month ago - Stars: 238 - Forks: 46
William-Fernandes252/astel
An asyncronous web crawling library for Python
Language: Python - Size: 1.02 MB - Last synced: 2 days ago - Pushed: 3 days ago - Stars: 0 - Forks: 0
lewisdonovan/google-news-scraper
Lightweight scraper for Google News
Language: JavaScript - Size: 285 KB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 181 - Forks: 54
kevinmarquesp/web_crawler_ex
Tool to extract all the anchor links URLs of websites, store that list of URLs inside a SQLite3 database and repeat that same process to each link recursively using multiple sub process in parallel
Language: Elixir - Size: 4.53 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 0 - Forks: 0
tobiasodion/RAGBOT
A CLI chatbot that uses RAG architecture for improving and adapting LLM to specific context. It allows users to ask questions and get response directly from open-source LLMs(OpenAI, MistralAI etc.) or from the information on a website which is provided as context using the RAG architecture.
Language: JavaScript - Size: 788 KB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 2 - Forks: 0
JcxAu/ReptilePy
This is a web crawler written in Python.
Language: Python - Size: 217 KB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 2 - Forks: 0
gpizzimenti/BookByNav
A (basic) utility to create an EPUB from online documentation
Language: Java - Size: 26.3 MB - Last synced: 5 days ago - Pushed: 5 days ago - Stars: 0 - Forks: 0
Jimut123/WEB-CRAWLLER
A web crawler which crawls through the whole internet
Language: Python - Size: 12.7 MB - Last synced: 5 days ago - Pushed: over 5 years ago - Stars: 1 - Forks: 0
leoguitar2006/bet-crawler
A web-crawler that selects games to bet on from a site according to certain criteria
Language: Go - Size: 15.6 KB - Last synced: 5 days ago - Pushed: 6 days ago - Stars: 1 - Forks: 0
infinilabs/crawler
🕷️ An easy-to-use spider written in Golang. (previous named GOPA.)
Language: Go - Size: 54.6 MB - Last synced: 4 days ago - Pushed: almost 3 years ago - Stars: 300 - Forks: 82
hyunwoongko/kochat
Opensource Korean chatbot framework
Language: Python - Size: 310 MB - Last synced: 4 days ago - Pushed: 12 months ago - Stars: 444 - Forks: 181
ScrapingAnt/amazon_scraper
Amazon products scraper with using of rotating proxies and headless Chrome from ScrapingAnt
Language: JavaScript - Size: 52.7 KB - Last synced: 7 days ago - Pushed: 2 months ago - Stars: 76 - Forks: 18
cloudy-sfu/Web-crawler-weibo
The toolbox to collect posts from https://weibo.com
Language: Python - Size: 34.2 KB - Last synced: 8 days ago - Pushed: 2 months ago - Stars: 4 - Forks: 2
cloudy-sfu/Web-crawler-novels
Download and compile books from online literature websites
Language: Python - Size: 34.2 KB - Last synced: 8 days ago - Pushed: 4 months ago - Stars: 0 - Forks: 0
cloudy-sfu/BOC-exchange-rate-vis
Visualization and downloading tool for BOC (Bank Of China) exchange rate
Language: Python - Size: 17.6 KB - Last synced: 8 days ago - Pushed: 17 days ago - Stars: 0 - Forks: 0
ToulisDev/updateSchoolCloud
It’s a Python Script that logins to a Student’s account using a POST Request on School’s login system. After a successful login, the script will search for any registered subjects by finding all table-children HTML tags and providing a download link for every subject.
Language: Python - Size: 5.86 KB - Last synced: 9 days ago - Pushed: 9 days ago - Stars: 0 - Forks: 0
commoncrawl/news-crawl
News crawling with StormCrawler - stores content as WARC
Language: Java - Size: 231 KB - Last synced: 9 days ago - Pushed: 5 months ago - Stars: 251 - Forks: 31
commoncrawl/nutch Fork of Aloisius/nutch
Common Crawl fork of Apache Nutch
Language: Java - Size: 130 MB - Last synced: 9 days ago - Pushed: 10 days ago - Stars: 24 - Forks: 2
TurnerSoftware/InfinityCrawler
A simple but powerful web crawler library for .NET
Language: C# - Size: 326 KB - Last synced: 10 days ago - Pushed: 5 months ago - Stars: 239 - Forks: 35
omkarcloud/omkar-temp-mail
🚀 OMKAR TEMP MAIL HELPS YOU USE TEMPORARY EMAILS. 🤖
Language: Python - Size: 15.6 KB - Last synced: 9 days ago - Pushed: 2 months ago - Stars: 11 - Forks: 4
gabriel-batistuta/amazon-tech-best-sellers
a simple search, extractor and ingestion system for get the best sellers products of tech on the Amazon
Language: Python - Size: 23.5 MB - Last synced: 10 days ago - Pushed: 12 days ago - Stars: 0 - Forks: 0
HCB2-NPT/Amazon-Web-Crawler
Seminar: Data Mining - Web Crawling
Language: Python - Size: 21.5 KB - Last synced: 11 days ago - Pushed: over 7 years ago - Stars: 1 - Forks: 2
spider2048/WebCrawler
A fast, asynchronous web crawler, indexer and a search engine
Language: JavaScript - Size: 1.1 MB - Last synced: 11 days ago - Pushed: 12 days ago - Stars: 1 - Forks: 0
krisluczka/OSSE
Open Source Search Engine with built-in web/document crawler and an indexing method.
Language: C++ - Size: 58.6 KB - Last synced: 11 days ago - Pushed: 12 days ago - Stars: 1 - Forks: 0
datagram-db/LeSSI-python
Crawling Web News and storing them in JSON Format
Language: Python - Size: 1.85 MB - Last synced: about 16 hours ago - Pushed: about 17 hours ago - Stars: 0 - Forks: 0
spencerlepine/open-source-crawler
Web crawler finding open source GitHub repositories, parsing README files, and scanning for typo/security issues.
Language: JavaScript - Size: 2.99 MB - Last synced: 12 days ago - Pushed: 5 months ago - Stars: 5 - Forks: 1
Norconex/crawlers
Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
Language: Java - Size: 10 MB - Last synced: about 15 hours ago - Pushed: 1 day ago - Stars: 172 - Forks: 65
armiro/crawlers
A bunch of crawlers for extracting data from various sites (site name is mentioned for each one)
Language: Python - Size: 163 KB - Last synced: 13 days ago - Pushed: 14 days ago - Stars: 9 - Forks: 18
redcode-labs/UnChain
A tool to find redirection chains in multiple URLs
Language: Go - Size: 3.3 MB - Last synced: 14 days ago - Pushed: almost 3 years ago - Stars: 79 - Forks: 12
Curovearth/Fake-News-Text-Classification-NLP
Natural Language Processing over news for text classification.
Language: Jupyter Notebook - Size: 4.31 MB - Last synced: 14 days ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
sh377c0d3/web_crawler
Spider and Crawl websites for you
Language: Python - Size: 9.77 KB - Last synced: 14 days ago - Pushed: over 3 years ago - Stars: 1 - Forks: 2
fuchsia-programming/scrape 📦
When you need those jobs hypersonic 🚀 scrape 🔪
Language: JavaScript - Size: 2.79 MB - Last synced: 14 days ago - Pushed: over 4 years ago - Stars: 10 - Forks: 3
briandfoy/webreaper 📦
Language: Perl - Size: 213 KB - Last synced: 14 days ago - Pushed: about 2 years ago - Stars: 1 - Forks: 0
melroy89/bitcoin-core-web-scraper
Web Spider for mirroring Bitcoin Core bin folder. Live: https://bitcoin.melroy.org/bin/ (mirror of bitcoincore.org/bin)
Language: Python - Size: 17.6 KB - Last synced: 14 days ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0
apache/nutch-webapp
Apache Nutch is an extensible and scalable web crawler
Language: Java - Size: 124 KB - Last synced: 6 days ago - Pushed: 10 months ago - Stars: 6 - Forks: 4
ewpratten/simplesearch
A simplistic search engine and web crawler built for learning purposes
Language: Python - Size: 49.8 KB - Last synced: 15 days ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0
oAGoulart/roedor
A modular web crawler Go module.
Language: Go - Size: 66.4 KB - Last synced: 15 days ago - Pushed: over 3 years ago - Stars: 1 - Forks: 0
elky84/lol-crawler
Notification from LOL friend game start & end.
Language: C# - Size: 1.6 MB - Last synced: 15 days ago - Pushed: 2 months ago - Stars: 2 - Forks: 0
gugarosa/politico_honesto
💸 An RPA-based assistant that extracts information about a candidate over TSE.
Language: Python - Size: 38.1 KB - Last synced: 15 days ago - Pushed: over 1 year ago - Stars: 2 - Forks: 0
Soypete/Example-Web-Crawler
Intro to Go example worked through in the Women Who Go Utah Workshop
Language: Go - Size: 975 KB - Last synced: 15 days ago - Pushed: over 5 years ago - Stars: 1 - Forks: 0
gosom/scrapemate
Golang Crawling and scraping framework
Language: Go - Size: 107 KB - Last synced: 15 days ago - Pushed: 15 days ago - Stars: 58 - Forks: 5
bkeepers/spiderman
your friendly neighborhood web crawler
Language: Ruby - Size: 39.1 KB - Last synced: 13 days ago - Pushed: almost 2 years ago - Stars: 18 - Forks: 4
tech-engine/goscrapy
GoScrapy: Harnessing Go's power for blazingly fast web scraping, inspired by Python's Scrapy framework.
Language: Go - Size: 348 KB - Last synced: 14 days ago - Pushed: 15 days ago - Stars: 42 - Forks: 0
Sieep-Coding/web-crawler
A simple web crawler implemented in Go.
Language: Go - Size: 10.1 MB - Last synced: 15 days ago - Pushed: 15 days ago - Stars: 3 - Forks: 0
MCStreetguy/Crawler
An advanced web-crawler written in PHP.
Language: PHP - Size: 224 KB - Last synced: 16 days ago - Pushed: about 5 years ago - Stars: 3 - Forks: 0
biraj21/web-wanderer
A multi-threaded web crawler written in Python, utilizing ThreadPoolExecutor and Playwright to efficiently crawl dynamically rendered web pages and download them.
Language: Python - Size: 202 KB - Last synced: 16 days ago - Pushed: 16 days ago - Stars: 10 - Forks: 1
lucasmonteiro001/ufmg_facebook-user-friendship-crawler
Python web crawler to gather user relationships' info from Facebook
Language: Python - Size: 566 KB - Last synced: 17 days ago - Pushed: about 7 years ago - Stars: 1 - Forks: 1
LSmyrnaios/PublicationsRetriever
A Java-program which retrieves the full-texts or datasets from the Publication-Web-Pages.
Language: Java - Size: 7.18 MB - Last synced: 17 days ago - Pushed: 17 days ago - Stars: 1 - Forks: 0
Madi-S/Lead-Generation
Python script, which empowers people with no programming background to generate robust leads on a mass scale. This repo will be compiled of various versatile techniques used in lead generation.
Language: Python - Size: 9.67 MB - Last synced: 14 days ago - Pushed: 4 months ago - Stars: 72 - Forks: 27
miroshnikov/scrapyteer
Web crawling & scraping framework for Node.js on top of headless Chrome browser
Language: TypeScript - Size: 384 KB - Last synced: 12 days ago - Pushed: 2 months ago - Stars: 18 - Forks: 0
Ryan-M-Smith/WebCrawler
A basic web crawler that logs emails and URLs on webpages. CS-240 Final Project.
Language: Java - Size: 4.51 MB - Last synced: 14 days ago - Pushed: 17 days ago - Stars: 0 - Forks: 0
jmb-ops/Spyder
A web crawler named Spyder. a command line tool like (ZAP) Zed Attack Proxy made for spidering/ crawling web pages made using only the python standard library. meaning no dependencies. For windows.
Language: Python - Size: 7.81 KB - Last synced: 18 days ago - Pushed: 18 days ago - Stars: 1 - Forks: 0
s0rg/crawley
The unix-way web crawler
Language: Go - Size: 225 KB - Last synced: 14 days ago - Pushed: about 1 month ago - Stars: 231 - Forks: 13
gicornachini/bolsa
Biblioteca feita em Python com o objetivo de facilitar o acesso a dados de seus investimentos na bolsa de valores(B3/CEI) através do Portal CEI.
Language: Python - Size: 125 KB - Last synced: 6 days ago - Pushed: almost 3 years ago - Stars: 59 - Forks: 17
antchfx/antch
Antch, a fast, powerful and extensible web crawling & scraping framework for Go
Language: Go - Size: 56.6 KB - Last synced: 16 days ago - Pushed: almost 4 years ago - Stars: 256 - Forks: 43
SystemStack/web-crawler-image-downloader
It's a web crawler extension
Language: Python - Size: 217 KB - Last synced: 19 days ago - Pushed: 10 months ago - Stars: 1 - Forks: 1
creuserr/pinterscrape
📌 A simple and lightweight pinterest web scraper
Size: 19.5 KB - Last synced: 20 days ago - Pushed: 21 days ago - Stars: 2 - Forks: 0
sjdirect/abot
Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.
Language: C# - Size: 6.93 MB - Last synced: 19 days ago - Pushed: 12 months ago - Stars: 2,205 - Forks: 554
inblack67/Web-Crawler
GoLang
Language: Go - Size: 0 Bytes - Last synced: 22 days ago - Pushed: almost 3 years ago - Stars: 1 - Forks: 0
Relex12/Wikipedia-Translate-Crawler
A Wikipedia crawler that gives the worst translated page around an english starting using hypertext links
Language: Shell - Size: 6.84 KB - Last synced: 22 days ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
BHM-Bob/BA_PY
some helpful python scripts. (Basic for All in Python)
Language: Python - Size: 1.21 MB - Last synced: 24 days ago - Pushed: 24 days ago - Stars: 1 - Forks: 1
brendonboshell/supercrawler
A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
Language: JavaScript - Size: 664 KB - Last synced: 17 days ago - Pushed: over 1 year ago - Stars: 368 - Forks: 63
lucky521/pyspider
My Web Spider
Language: Python - Size: 263 KB - Last synced: 23 days ago - Pushed: almost 7 years ago - Stars: 0 - Forks: 0
platonai/PulsarRPAPro
Professional edition; AI for auto extraction; Web UI; examples.
Language: Kotlin - Size: 14.4 MB - Last synced: 22 days ago - Pushed: 22 days ago - Stars: 76 - Forks: 17
lesterrry/campfire
Shock-drop watching utility
Language: Python - Size: 146 KB - Last synced: 26 days ago - Pushed: 8 months ago - Stars: 0 - Forks: 0
mawippel/drizly-crawler
Beers' characteristics crawler for drizly.com. Written in Python.
Language: Python - Size: 26.4 KB - Last synced: 26 days ago - Pushed: almost 4 years ago - Stars: 1 - Forks: 1
Algebra-FUN/WeReadScan
扫描“微信读书”已购图书并下载本地PDF的爬虫
Language: Python - Size: 520 KB - Last synced: 26 days ago - Pushed: 8 months ago - Stars: 552 - Forks: 126
apache/incubator-stormcrawler
A scalable, mature and versatile web crawler based on Apache Storm
Language: HTML - Size: 6.41 MB - Last synced: 27 days ago - Pushed: 30 days ago - Stars: 856 - Forks: 252
crawlab-team/crawlab
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Language: Go - Size: 23.6 MB - Last synced: 27 days ago - Pushed: 27 days ago - Stars: 10,782 - Forks: 1,721
crwlrsoft/crawler
Library for Rapid (Web) Crawler and Scraper Development
Language: PHP - Size: 845 KB - Last synced: 26 days ago - Pushed: about 2 months ago - Stars: 299 - Forks: 11
let4be/crusty-core
A small library for building fast and highly customizable web crawlers
Language: Rust - Size: 436 KB - Last synced: 1 day ago - Pushed: over 1 year ago - Stars: 15 - Forks: 1
dotandimet/Mojo-UserAgent-Role-Queued
A role for Mojo::UserAgent that processes non-blocking requests in a rate-limiting queue.
Language: Perl - Size: 65.4 KB - Last synced: 28 days ago - Pushed: almost 5 years ago - Stars: 3 - Forks: 2
AmeyRuikar/webCrawler
A multi-threaded web crawler using crawler4j.
Language: Java - Size: 13.7 KB - Last synced: 29 days ago - Pushed: over 7 years ago - Stars: 0 - Forks: 0
gr1d99/scripts
Random python scripts
Language: Python - Size: 684 KB - Last synced: about 1 month ago - Pushed: over 6 years ago - Stars: 0 - Forks: 0
remram44/crawler-structure 📦
Basic Twisted structure for web crawling (doesn't actually crawl right now)
Language: Python - Size: 121 KB - Last synced: about 1 month ago - Pushed: about 9 years ago - Stars: 0 - Forks: 0
guvenonur/airtime
IMDB airtime crawler and notifier for TV shows
Language: Python - Size: 13.9 MB - Last synced: about 1 month ago - Pushed: 10 months ago - Stars: 2 - Forks: 1
marcofavorito/simple-web-crawler
A very simple web crawler.
Language: Python - Size: 7.81 KB - Last synced: about 1 month ago - Pushed: over 5 years ago - Stars: 0 - Forks: 2
spider-rs/spider-py
Spider ported to Python
Language: Rust - Size: 1.21 MB - Last synced: 9 days ago - Pushed: 29 days ago - Stars: 14 - Forks: 0
apify/crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
Language: TypeScript - Size: 117 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 11,973 - Forks: 501
danielzlatanov/estate-fetch
Real estate search engine
Language: HTML - Size: 1.6 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 1
qfcy/Python
This repository contains the python source code, containing more than 40 python projects, involving many fields.仓库用于储存python源代码, 包含40多个python项目,涉及爬虫、算法、OpenGL、tkinter、面向对象编程等多个领域。
Language: Python - Size: 26.7 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 3 - Forks: 4
rzo1/crawler4j Fork of yasserg/crawler4j
Open Source Web Crawler for Java - A maintained fork of yasserg/crawler4j
Language: Java - Size: 1.9 MB - Last synced: 14 days ago - Pushed: about 1 month ago - Stars: 22 - Forks: 4
kan01234/ur-web-spider
web spider to scan UR avialbe room and output as csv
Language: Python - Size: 37 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 4 - Forks: 1
Cirice/Krawler 📦
A complete multi-threaded web-crawler in Python3
Language: Python - Size: 58.6 KB - Last synced: 2 months ago - Pushed: over 5 years ago - Stars: 12 - Forks: 5
Misterhex/WebCrawler
Just a simple web crawler which return crawled links as IObservable using reactive extension and async await.
Language: C# - Size: 7.13 MB - Last synced: about 1 month ago - Pushed: almost 5 years ago - Stars: 60 - Forks: 33
Amrita-TIFAC-Cyber-Blockchain/MeRiT
Hyperledger Challenge 2022 : Media Tracking Platform to Tackle Online Piracy
Language: PHP - Size: 9.68 MB - Last synced: about 1 month ago - Pushed: 10 months ago - Stars: 1 - Forks: 2
rivermont/spidy
The simple, easy to use command line web crawler.
Language: Python - Size: 81.8 MB - Last synced: 29 days ago - Pushed: 7 months ago - Stars: 323 - Forks: 67
devopsgroup-io/siteshooter
:camera: Automate full website screenshots and PDF generation with multiple viewport support.
Language: JavaScript - Size: 496 KB - Last synced: about 1 month ago - Pushed: about 5 years ago - Stars: 67 - Forks: 13
postmodern/spidr
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Language: Ruby - Size: 677 KB - Last synced: 13 days ago - Pushed: 4 months ago - Stars: 792 - Forks: 111
wyu-du/StockForecast
:dart: predict the price trend of individual stocks using deep learning and natural language processing
Language: Python - Size: 29.4 MB - Last synced: 28 days ago - Pushed: over 6 years ago - Stars: 74 - Forks: 34
idkidkidkidkidkidkidkidk/gics-sentry-bot
尋找資安女婕思初賽PaGamO哨兵機器人
Language: Python - Size: 2.62 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 5 - Forks: 0