Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: web-crawler

internetarchive/Zeno

State-of-the-art web crawler 🔱

Language: Go - Size: 661 KB - Last synced: about 4 hours ago - Pushed: about 10 hours ago - Stars: 36 - Forks: 2

luiswirth/crawler

An asynchronous web crawler.

Language: Rust - Size: 442 KB - Last synced: about 11 hours ago - Pushed: 12 months ago - Stars: 3 - Forks: 0

bAndie91/tools

all-in collection of productivity scripts, CLI tools, utility libraries, fuse filesystems, and also some stuff

Language: Shell - Size: 1.02 MB - Last synced: about 15 hours ago - Pushed: about 15 hours ago - Stars: 16 - Forks: 1

mendableai/firecrawl

🔥 Turn entire websites into LLM-ready markdown

Language: TypeScript - Size: 1.08 MB - Last synced: about 15 hours ago - Pushed: about 20 hours ago - Stars: 2,852 - Forks: 236

webrecorder/browsertrix-crawler

Run a high-fidelity browser-based crawler in a single Docker container

Language: TypeScript - Size: 52.5 MB - Last synced: about 17 hours ago - Pushed: about 17 hours ago - Stars: 552 - Forks: 71

BruceDone/awesome-crawler

A collection of awesome web crawler,spider in different languages

Size: 74.2 KB - Last synced: 1 day ago - Pushed: about 1 month ago - Stars: 6,140 - Forks: 675

omkarcloud/botasaurus-starter

🚀 OFFICIAL STARTER TEMPLATE FOR BOTASAURUS SCRAPING FRAMEWORK 🤖

Language: TypeScript - Size: 385 KB - Last synced: about 18 hours ago - Pushed: 1 day ago - Stars: 13 - Forks: 4

industrialsociety/Panoptes.py

A Python web crawler that efficiently manages URLs with SQLite, supports real-time progress updates via keypress, and robustly handles various parsing and request errors.

Language: Python - Size: 19.5 KB - Last synced: 1 day ago - Pushed: 28 days ago - Stars: 0 - Forks: 0

apache/nutch

Apache Nutch is an extensible and scalable web crawler

Language: Java - Size: 131 MB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 2,820 - Forks: 1,249

duyet/awesome-web-scraper

A collection of awesome web scaper, crawler.

Size: 48.8 KB - Last synced: about 8 hours ago - Pushed: about 1 month ago - Stars: 238 - Forks: 46

William-Fernandes252/astel

An asyncronous web crawling library for Python

Language: Python - Size: 1.02 MB - Last synced: 2 days ago - Pushed: 3 days ago - Stars: 0 - Forks: 0

lewisdonovan/google-news-scraper

Lightweight scraper for Google News

Language: JavaScript - Size: 285 KB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 181 - Forks: 54

kevinmarquesp/web_crawler_ex

Tool to extract all the anchor links URLs of websites, store that list of URLs inside a SQLite3 database and repeat that same process to each link recursively using multiple sub process in parallel

Language: Elixir - Size: 4.53 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 0 - Forks: 0

tobiasodion/RAGBOT

A CLI chatbot that uses RAG architecture for improving and adapting LLM to specific context. It allows users to ask questions and get response directly from open-source LLMs(OpenAI, MistralAI etc.) or from the information on a website which is provided as context using the RAG architecture.

Language: JavaScript - Size: 788 KB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 2 - Forks: 0

JcxAu/ReptilePy

This is a web crawler written in Python.

Language: Python - Size: 217 KB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 2 - Forks: 0

gpizzimenti/BookByNav

A (basic) utility to create an EPUB from online documentation

Language: Java - Size: 26.3 MB - Last synced: 5 days ago - Pushed: 5 days ago - Stars: 0 - Forks: 0

Jimut123/WEB-CRAWLLER

A web crawler which crawls through the whole internet

Language: Python - Size: 12.7 MB - Last synced: 5 days ago - Pushed: over 5 years ago - Stars: 1 - Forks: 0

leoguitar2006/bet-crawler

A web-crawler that selects games to bet on from a site according to certain criteria

Language: Go - Size: 15.6 KB - Last synced: 5 days ago - Pushed: 6 days ago - Stars: 1 - Forks: 0

infinilabs/crawler

🕷️ An easy-to-use spider written in Golang. (previous named GOPA.)

Language: Go - Size: 54.6 MB - Last synced: 4 days ago - Pushed: almost 3 years ago - Stars: 300 - Forks: 82

hyunwoongko/kochat

Opensource Korean chatbot framework

Language: Python - Size: 310 MB - Last synced: 4 days ago - Pushed: 12 months ago - Stars: 444 - Forks: 181

ScrapingAnt/amazon_scraper

Amazon products scraper with using of rotating proxies and headless Chrome from ScrapingAnt

Language: JavaScript - Size: 52.7 KB - Last synced: 7 days ago - Pushed: 2 months ago - Stars: 76 - Forks: 18

cloudy-sfu/Web-crawler-weibo

The toolbox to collect posts from https://weibo.com

Language: Python - Size: 34.2 KB - Last synced: 8 days ago - Pushed: 2 months ago - Stars: 4 - Forks: 2

cloudy-sfu/Web-crawler-novels

Download and compile books from online literature websites

Language: Python - Size: 34.2 KB - Last synced: 8 days ago - Pushed: 4 months ago - Stars: 0 - Forks: 0

cloudy-sfu/BOC-exchange-rate-vis

Visualization and downloading tool for BOC (Bank Of China) exchange rate

Language: Python - Size: 17.6 KB - Last synced: 8 days ago - Pushed: 17 days ago - Stars: 0 - Forks: 0

ToulisDev/updateSchoolCloud

It’s a Python Script that logins to a Student’s account using a POST Request on School’s login system. After a successful login, the script will search for any registered subjects by finding all table-children HTML tags and providing a download link for every subject.

Language: Python - Size: 5.86 KB - Last synced: 9 days ago - Pushed: 9 days ago - Stars: 0 - Forks: 0

commoncrawl/news-crawl

News crawling with StormCrawler - stores content as WARC

Language: Java - Size: 231 KB - Last synced: 9 days ago - Pushed: 5 months ago - Stars: 251 - Forks: 31

commoncrawl/nutch Fork of Aloisius/nutch

Common Crawl fork of Apache Nutch

Language: Java - Size: 130 MB - Last synced: 9 days ago - Pushed: 10 days ago - Stars: 24 - Forks: 2

TurnerSoftware/InfinityCrawler

A simple but powerful web crawler library for .NET

Language: C# - Size: 326 KB - Last synced: 10 days ago - Pushed: 5 months ago - Stars: 239 - Forks: 35

omkarcloud/omkar-temp-mail

🚀 OMKAR TEMP MAIL HELPS YOU USE TEMPORARY EMAILS. 🤖

Language: Python - Size: 15.6 KB - Last synced: 9 days ago - Pushed: 2 months ago - Stars: 11 - Forks: 4

gabriel-batistuta/amazon-tech-best-sellers

a simple search, extractor and ingestion system for get the best sellers products of tech on the Amazon

Language: Python - Size: 23.5 MB - Last synced: 10 days ago - Pushed: 12 days ago - Stars: 0 - Forks: 0

HCB2-NPT/Amazon-Web-Crawler

Seminar: Data Mining - Web Crawling

Language: Python - Size: 21.5 KB - Last synced: 11 days ago - Pushed: over 7 years ago - Stars: 1 - Forks: 2

spider2048/WebCrawler

A fast, asynchronous web crawler, indexer and a search engine

Language: JavaScript - Size: 1.1 MB - Last synced: 11 days ago - Pushed: 12 days ago - Stars: 1 - Forks: 0

krisluczka/OSSE

Open Source Search Engine with built-in web/document crawler and an indexing method.

Language: C++ - Size: 58.6 KB - Last synced: 11 days ago - Pushed: 12 days ago - Stars: 1 - Forks: 0

datagram-db/LeSSI-python

Crawling Web News and storing them in JSON Format

Language: Python - Size: 1.85 MB - Last synced: about 16 hours ago - Pushed: about 17 hours ago - Stars: 0 - Forks: 0

spencerlepine/open-source-crawler

Web crawler finding open source GitHub repositories, parsing README files, and scanning for typo/security issues.

Language: JavaScript - Size: 2.99 MB - Last synced: 12 days ago - Pushed: 5 months ago - Stars: 5 - Forks: 1

Norconex/crawlers

Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.

Language: Java - Size: 10 MB - Last synced: about 15 hours ago - Pushed: 1 day ago - Stars: 172 - Forks: 65

armiro/crawlers

A bunch of crawlers for extracting data from various sites (site name is mentioned for each one)

Language: Python - Size: 163 KB - Last synced: 13 days ago - Pushed: 14 days ago - Stars: 9 - Forks: 18

redcode-labs/UnChain

A tool to find redirection chains in multiple URLs

Language: Go - Size: 3.3 MB - Last synced: 14 days ago - Pushed: almost 3 years ago - Stars: 79 - Forks: 12

Curovearth/Fake-News-Text-Classification-NLP

Natural Language Processing over news for text classification.

Language: Jupyter Notebook - Size: 4.31 MB - Last synced: 14 days ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

sh377c0d3/web_crawler

Spider and Crawl websites for you

Language: Python - Size: 9.77 KB - Last synced: 14 days ago - Pushed: over 3 years ago - Stars: 1 - Forks: 2

fuchsia-programming/scrape 📦

When you need those jobs hypersonic 🚀 scrape 🔪

Language: JavaScript - Size: 2.79 MB - Last synced: 14 days ago - Pushed: over 4 years ago - Stars: 10 - Forks: 3

briandfoy/webreaper 📦

Language: Perl - Size: 213 KB - Last synced: 14 days ago - Pushed: about 2 years ago - Stars: 1 - Forks: 0

melroy89/bitcoin-core-web-scraper

Web Spider for mirroring Bitcoin Core bin folder. Live: https://bitcoin.melroy.org/bin/ (mirror of bitcoincore.org/bin)

Language: Python - Size: 17.6 KB - Last synced: 14 days ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

apache/nutch-webapp

Apache Nutch is an extensible and scalable web crawler

Language: Java - Size: 124 KB - Last synced: 6 days ago - Pushed: 10 months ago - Stars: 6 - Forks: 4

ewpratten/simplesearch

A simplistic search engine and web crawler built for learning purposes

Language: Python - Size: 49.8 KB - Last synced: 15 days ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0

oAGoulart/roedor

A modular web crawler Go module.

Language: Go - Size: 66.4 KB - Last synced: 15 days ago - Pushed: over 3 years ago - Stars: 1 - Forks: 0

elky84/lol-crawler

Notification from LOL friend game start & end.

Language: C# - Size: 1.6 MB - Last synced: 15 days ago - Pushed: 2 months ago - Stars: 2 - Forks: 0

gugarosa/politico_honesto

💸 An RPA-based assistant that extracts information about a candidate over TSE.

Language: Python - Size: 38.1 KB - Last synced: 15 days ago - Pushed: over 1 year ago - Stars: 2 - Forks: 0

Soypete/Example-Web-Crawler

Intro to Go example worked through in the Women Who Go Utah Workshop

Language: Go - Size: 975 KB - Last synced: 15 days ago - Pushed: over 5 years ago - Stars: 1 - Forks: 0

gosom/scrapemate

Golang Crawling and scraping framework

Language: Go - Size: 107 KB - Last synced: 15 days ago - Pushed: 15 days ago - Stars: 58 - Forks: 5

bkeepers/spiderman

your friendly neighborhood web crawler

Language: Ruby - Size: 39.1 KB - Last synced: 13 days ago - Pushed: almost 2 years ago - Stars: 18 - Forks: 4

tech-engine/goscrapy

GoScrapy: Harnessing Go's power for blazingly fast web scraping, inspired by Python's Scrapy framework.

Language: Go - Size: 348 KB - Last synced: 14 days ago - Pushed: 15 days ago - Stars: 42 - Forks: 0

Sieep-Coding/web-crawler

A simple web crawler implemented in Go.

Language: Go - Size: 10.1 MB - Last synced: 15 days ago - Pushed: 15 days ago - Stars: 3 - Forks: 0

MCStreetguy/Crawler

An advanced web-crawler written in PHP.

Language: PHP - Size: 224 KB - Last synced: 16 days ago - Pushed: about 5 years ago - Stars: 3 - Forks: 0

biraj21/web-wanderer

A multi-threaded web crawler written in Python, utilizing ThreadPoolExecutor and Playwright to efficiently crawl dynamically rendered web pages and download them.

Language: Python - Size: 202 KB - Last synced: 16 days ago - Pushed: 16 days ago - Stars: 10 - Forks: 1

lucasmonteiro001/ufmg_facebook-user-friendship-crawler

Python web crawler to gather user relationships' info from Facebook

Language: Python - Size: 566 KB - Last synced: 17 days ago - Pushed: about 7 years ago - Stars: 1 - Forks: 1

LSmyrnaios/PublicationsRetriever

A Java-program which retrieves the full-texts or datasets from the Publication-Web-Pages.

Language: Java - Size: 7.18 MB - Last synced: 17 days ago - Pushed: 17 days ago - Stars: 1 - Forks: 0

Madi-S/Lead-Generation

Python script, which empowers people with no programming background to generate robust leads on a mass scale. This repo will be compiled of various versatile techniques used in lead generation.

Language: Python - Size: 9.67 MB - Last synced: 14 days ago - Pushed: 4 months ago - Stars: 72 - Forks: 27

miroshnikov/scrapyteer

Web crawling & scraping framework for Node.js on top of headless Chrome browser

Language: TypeScript - Size: 384 KB - Last synced: 12 days ago - Pushed: 2 months ago - Stars: 18 - Forks: 0

Ryan-M-Smith/WebCrawler

A basic web crawler that logs emails and URLs on webpages. CS-240 Final Project.

Language: Java - Size: 4.51 MB - Last synced: 14 days ago - Pushed: 17 days ago - Stars: 0 - Forks: 0

jmb-ops/Spyder

A web crawler named Spyder. a command line tool like (ZAP) Zed Attack Proxy made for spidering/ crawling web pages made using only the python standard library. meaning no dependencies. For windows.

Language: Python - Size: 7.81 KB - Last synced: 18 days ago - Pushed: 18 days ago - Stars: 1 - Forks: 0

s0rg/crawley

The unix-way web crawler

Language: Go - Size: 225 KB - Last synced: 14 days ago - Pushed: about 1 month ago - Stars: 231 - Forks: 13

gicornachini/bolsa

Biblioteca feita em Python com o objetivo de facilitar o acesso a dados de seus investimentos na bolsa de valores(B3/CEI) através do Portal CEI.

Language: Python - Size: 125 KB - Last synced: 6 days ago - Pushed: almost 3 years ago - Stars: 59 - Forks: 17

antchfx/antch

Antch, a fast, powerful and extensible web crawling & scraping framework for Go

Language: Go - Size: 56.6 KB - Last synced: 16 days ago - Pushed: almost 4 years ago - Stars: 256 - Forks: 43

SystemStack/web-crawler-image-downloader

It's a web crawler extension

Language: Python - Size: 217 KB - Last synced: 19 days ago - Pushed: 10 months ago - Stars: 1 - Forks: 1

creuserr/pinterscrape

📌 A simple and lightweight pinterest web scraper

Size: 19.5 KB - Last synced: 20 days ago - Pushed: 21 days ago - Stars: 2 - Forks: 0

sjdirect/abot

Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.

Language: C# - Size: 6.93 MB - Last synced: 19 days ago - Pushed: 12 months ago - Stars: 2,205 - Forks: 554

inblack67/Web-Crawler

GoLang

Language: Go - Size: 0 Bytes - Last synced: 22 days ago - Pushed: almost 3 years ago - Stars: 1 - Forks: 0

Relex12/Wikipedia-Translate-Crawler

A Wikipedia crawler that gives the worst translated page around an english starting using hypertext links

Language: Shell - Size: 6.84 KB - Last synced: 22 days ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

BHM-Bob/BA_PY

some helpful python scripts. (Basic for All in Python)

Language: Python - Size: 1.21 MB - Last synced: 24 days ago - Pushed: 24 days ago - Stars: 1 - Forks: 1

brendonboshell/supercrawler

A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.

Language: JavaScript - Size: 664 KB - Last synced: 17 days ago - Pushed: over 1 year ago - Stars: 368 - Forks: 63

lucky521/pyspider

My Web Spider

Language: Python - Size: 263 KB - Last synced: 23 days ago - Pushed: almost 7 years ago - Stars: 0 - Forks: 0

platonai/PulsarRPAPro

Professional edition; AI for auto extraction; Web UI; examples.

Language: Kotlin - Size: 14.4 MB - Last synced: 22 days ago - Pushed: 22 days ago - Stars: 76 - Forks: 17

lesterrry/campfire

Shock-drop watching utility

Language: Python - Size: 146 KB - Last synced: 26 days ago - Pushed: 8 months ago - Stars: 0 - Forks: 0

mawippel/drizly-crawler

Beers' characteristics crawler for drizly.com. Written in Python.

Language: Python - Size: 26.4 KB - Last synced: 26 days ago - Pushed: almost 4 years ago - Stars: 1 - Forks: 1

Algebra-FUN/WeReadScan

扫描“微信读书”已购图书并下载本地PDF的爬虫

Language: Python - Size: 520 KB - Last synced: 26 days ago - Pushed: 8 months ago - Stars: 552 - Forks: 126

apache/incubator-stormcrawler

A scalable, mature and versatile web crawler based on Apache Storm

Language: HTML - Size: 6.41 MB - Last synced: 27 days ago - Pushed: 30 days ago - Stars: 856 - Forks: 252

crawlab-team/crawlab

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架

Language: Go - Size: 23.6 MB - Last synced: 27 days ago - Pushed: 27 days ago - Stars: 10,782 - Forks: 1,721

crwlrsoft/crawler

Library for Rapid (Web) Crawler and Scraper Development

Language: PHP - Size: 845 KB - Last synced: 26 days ago - Pushed: about 2 months ago - Stars: 299 - Forks: 11

let4be/crusty-core

A small library for building fast and highly customizable web crawlers

Language: Rust - Size: 436 KB - Last synced: 1 day ago - Pushed: over 1 year ago - Stars: 15 - Forks: 1

dotandimet/Mojo-UserAgent-Role-Queued

A role for Mojo::UserAgent that processes non-blocking requests in a rate-limiting queue.

Language: Perl - Size: 65.4 KB - Last synced: 28 days ago - Pushed: almost 5 years ago - Stars: 3 - Forks: 2

AmeyRuikar/webCrawler

A multi-threaded web crawler using crawler4j.

Language: Java - Size: 13.7 KB - Last synced: 29 days ago - Pushed: over 7 years ago - Stars: 0 - Forks: 0

gr1d99/scripts

Random python scripts

Language: Python - Size: 684 KB - Last synced: about 1 month ago - Pushed: over 6 years ago - Stars: 0 - Forks: 0

remram44/crawler-structure 📦

Basic Twisted structure for web crawling (doesn't actually crawl right now)

Language: Python - Size: 121 KB - Last synced: about 1 month ago - Pushed: about 9 years ago - Stars: 0 - Forks: 0

guvenonur/airtime

IMDB airtime crawler and notifier for TV shows

Language: Python - Size: 13.9 MB - Last synced: about 1 month ago - Pushed: 10 months ago - Stars: 2 - Forks: 1

marcofavorito/simple-web-crawler

A very simple web crawler.

Language: Python - Size: 7.81 KB - Last synced: about 1 month ago - Pushed: over 5 years ago - Stars: 0 - Forks: 2

spider-rs/spider-py

Spider ported to Python

Language: Rust - Size: 1.21 MB - Last synced: 9 days ago - Pushed: 29 days ago - Stars: 14 - Forks: 0

apify/crawlee

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

Language: TypeScript - Size: 117 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 11,973 - Forks: 501

danielzlatanov/estate-fetch

Real estate search engine

Language: HTML - Size: 1.6 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 1

qfcy/Python

This repository contains the python source code, containing more than 40 python projects, involving many fields.仓库用于储存python源代码, 包含40多个python项目,涉及爬虫、算法、OpenGL、tkinter、面向对象编程等多个领域。

Language: Python - Size: 26.7 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 3 - Forks: 4

rzo1/crawler4j Fork of yasserg/crawler4j

Open Source Web Crawler for Java - A maintained fork of yasserg/crawler4j

Language: Java - Size: 1.9 MB - Last synced: 14 days ago - Pushed: about 1 month ago - Stars: 22 - Forks: 4

kan01234/ur-web-spider

web spider to scan UR avialbe room and output as csv

Language: Python - Size: 37 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 4 - Forks: 1

Cirice/Krawler 📦

A complete multi-threaded web-crawler in Python3

Language: Python - Size: 58.6 KB - Last synced: 2 months ago - Pushed: over 5 years ago - Stars: 12 - Forks: 5

Misterhex/WebCrawler

Just a simple web crawler which return crawled links as IObservable using reactive extension and async await.

Language: C# - Size: 7.13 MB - Last synced: about 1 month ago - Pushed: almost 5 years ago - Stars: 60 - Forks: 33

Amrita-TIFAC-Cyber-Blockchain/MeRiT

Hyperledger Challenge 2022 : Media Tracking Platform to Tackle Online Piracy

Language: PHP - Size: 9.68 MB - Last synced: about 1 month ago - Pushed: 10 months ago - Stars: 1 - Forks: 2

rivermont/spidy

The simple, easy to use command line web crawler.

Language: Python - Size: 81.8 MB - Last synced: 29 days ago - Pushed: 7 months ago - Stars: 323 - Forks: 67

devopsgroup-io/siteshooter

:camera: Automate full website screenshots and PDF generation with multiple viewport support.

Language: JavaScript - Size: 496 KB - Last synced: about 1 month ago - Pushed: about 5 years ago - Stars: 67 - Forks: 13

postmodern/spidr

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Language: Ruby - Size: 677 KB - Last synced: 13 days ago - Pushed: 4 months ago - Stars: 792 - Forks: 111

wyu-du/StockForecast

:dart: predict the price trend of individual stocks using deep learning and natural language processing

Language: Python - Size: 29.4 MB - Last synced: 28 days ago - Pushed: over 6 years ago - Stars: 74 - Forks: 34

idkidkidkidkidkidkidkidk/gics-sentry-bot

尋找資安女婕思初賽PaGamO哨兵機器人

Language: Python - Size: 2.62 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 5 - Forks: 0