GitHub topics: scrapy | Ecosyste.ms: Repos

george-gca/ai_papers_scrapper

Download papers pdfs and other info from main AI conferences

Language: Python - Size: 210 KB - Last synced at: about 1 hour ago - Pushed at: about 4 hours ago - Stars: 29 - Forks: 6

crawlab-team/crawlab

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

Language: Go - Size: 23.8 MB - Last synced at: about 24 hours ago - Pushed at: 1 day ago - Stars: 11,829 - Forks: 1,845

SpiderClub/haipproxy

:sparkling_heart: High available distributed ip proxy pool, powerd by Scrapy and Redis

Language: Python - Size: 1.16 MB - Last synced at: 1 day ago - Pushed at: over 2 years ago - Stars: 5,489 - Forks: 911

salimt/Transfermarkt-ETL-and-LIVE-Scores

asyncIO, Github Actions, GCP, dbt, Terraform, Docker

Language: Python - Size: 114 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 3 - Forks: 0

povilasb/scrapy-html-storage

Scrapy downloader middleware that stores response HTMLs to disk.

Language: Python - Size: 19.5 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 18 - Forks: 2

DmitryAlexandrov91/scrapy_parser_example

Пример парсера на бибилотеке Scrapy

Language: Python - Size: 23.4 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

fuunshi/ShareSansarDataScrape

Daily auto scrapping of Share price form Share Sansar

Language: Python - Size: 6.99 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 3 - Forks: 0

ArthurdxvMods/laptop_price_prediction

Predict laptop prices using machine learning with a Random Forest model. Explore data, build a pipeline, and interact through a Streamlit app. 🖥️📊

Language: Jupyter Notebook - Size: 3.89 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

my8100/scrapydweb

Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Monitor & Alert, and Mobile UI. Docs 文档 :point_right:

Language: Python - Size: 3.05 MB - Last synced at: about 17 hours ago - Pushed at: 5 months ago - Stars: 3,317 - Forks: 579

honzajavorek/czap

Scraping czap.cz data so you can filter available psychotherapists by any criteria you wish

Language: Python - Size: 2.84 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2 - Forks: 0

A powerful Python script that allows you to scrape messages and media from Telegram channels using the Telethon library. Features include real-time continuous scraping, media downloading, and data export capabilities.

Language: Python - Size: 31.3 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 301 - Forks: 69

buildwithtract/planning-applications

Scrape planning applications from local planning authorities in the UK

Language: Python - Size: 2.46 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 13 - Forks: 7

iaoongin/GachaClock

卡池倒计时。支持查看崩坏星穹铁道，绝区零，鸣潮卡池信息

Language: Python - Size: 59.5 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

scrapy/flake8-scrapy

A Flake8 plugin to catch common issues in Scrapy projects.

Language: Python - Size: 328 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 19 - Forks: 4

Luqman-Ud-Din/random_user_agent

A package to get list of user agents based on filters such as operating system, software name etc..

Language: Python - Size: 12.9 MB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 99 - Forks: 12

kkuvam/web-scrape

Web Scraping Technology Evaluation - Evaluation of different web scraping technologies in Python, with a focus on Requests, BeautifulSoup, and Scrapy. Benchmarked each technology for ease of use, performance, scalability, and maintainability

Language: Jupyter Notebook - Size: 10.7 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

TikHub/TikHub-API-Python-SDK

High-performance asynchronous Douyin(抖音) TikTok Xiaohongshu(小红书) Kuaishou(快手) Weibo(微博) Instagram YouTube(油管) Twitter(X) Captcha Solver(验证码解决器) Temp Mail(临时邮箱) API(接口).

Language: Python - Size: 2.05 MB - Last synced at: 5 days ago - Pushed at: 8 months ago - Stars: 480 - Forks: 55

honzajavorek/czech-political-parties

Tracking changes in Czech political parties

Language: Python - Size: 1.54 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 5 - Forks: 1

casangi/casadocs

Common Astronomy Software Applications Documentation

Language: Python - Size: 47.4 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 12 - Forks: 11

joaopauloaramuni/python

Repo Python

Language: HTML - Size: 164 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 45 - Forks: 1

alltheplaces/alltheplaces

A set of spiders and scrapers to extract location information from places that post their location on the internet.

Language: Python - Size: 32.3 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 700 - Forks: 233

rmax/scrapy-redis

Redis-based components for Scrapy.

Language: Python - Size: 228 KB - Last synced at: 1 day ago - Pushed at: about 1 year ago - Stars: 5,622 - Forks: 1,586

jonbakerfish/TweetScraper

TweetScraper is a simple crawler/spider for Twitter Search without using API

Language: Python - Size: 58.6 KB - Last synced at: 3 days ago - Pushed at: over 4 years ago - Stars: 1,045 - Forks: 314

NguyenDa18/Portland-Jail-Data-Crawler

Scraper used for recording changes to Portland jail database

Language: Jupyter Notebook - Size: 41.4 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 5 - Forks: 0

pjosalgado/collectors-channel-scraper

GitLab mirror repository.

Language: Python - Size: 90.8 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 1

halifox/dict_spider

搜狗、百度、QQ 词库爬虫

Language: Python - Size: 19.5 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

schmwong/APAC-McDelivery-Menu-Logger

Automatically scrapes McDelivery menu data and records it for future visualisation projects

Language: Python - Size: 42 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 6 - Forks: 2

Erkmik/best-python-html-parsers

The top Python HTML parsers for web scraping, including Beautiful Soup, lxml, PyQuery, Scrapy, and more.

Size: 6.84 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

apify/actor-scrapy-executor

Apify actor to run web spiders written in Python in the Scrapy library

Language: Python - Size: 316 KB - Last synced at: 6 days ago - Pushed at: over 2 years ago - Stars: 11 - Forks: 5

apify/actor-scrapy-books-example

Example of Python Scrapy project. It scrapes book data from https://books.toscrape.com/.

Language: Python - Size: 90.8 KB - Last synced at: 6 days ago - Pushed at: 15 days ago - Stars: 3 - Forks: 1

QARTER-FR/Web-Scraping-Pro

Advanced Web Data Extraction & Automation Platform | Anti-Detection | Proxy Rotation | Multiple Formats | Commercial License Available

Size: 1000 Bytes - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

irineulucas/SentimenTA

SentimenTA is a sentiment analysis tool designed to analyze and interpret emotions and opinions from textual data. It utilizes natural language processing techniques to provide insights into the sentiment behind the text.

Size: 1000 Bytes - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

public-law/open-gov-crawlers

Parse government documents into well formed JSON

Language: Python - Size: 18.7 MB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 70 - Forks: 8

kidult00/scrapy-jdxl

用 scrapy 抓取简单心理网站上心理咨询师的资料

Language: Jupyter Notebook - Size: 1.08 MB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 1 - Forks: 0

Gerapy/Gerapy

Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

Language: Python - Size: 36.6 MB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 3,466 - Forks: 647

shyrz0824/Checkee

Frontend developer passionate about user-friendly interfaces. Skilled in React, TypeScript, and responsive design. Explore my projects on GitHub! 🌟💻

Size: 1.95 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

wenyalintw/Google-Patents-Scraper

Automatically download all PDF files of searching results & their patent families found on Google Patents.

Language: Python - Size: 54.9 MB - Last synced at: 3 days ago - Pushed at: over 2 years ago - Stars: 70 - Forks: 22

Harishwarrior/movie_scrap_backend

Python Scrapy script to scrap magnet links and titles from Tamilrockers site.

Language: Python - Size: 44.9 KB - Last synced at: 8 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

q-m/scrapyd-k8s

Scrapyd on container infrastructure

Language: Python - Size: 101 KB - Last synced at: 4 days ago - Pushed at: 3 months ago - Stars: 17 - Forks: 6

edony-ink/anth

Customized AI system for a efficient daylife

Language: Python - Size: 13.4 MB - Last synced at: 9 days ago - Pushed at: almost 8 years ago - Stars: 1 - Forks: 0

zehengl/scrapy-darksky-api

A scrapy app to crawl weather data from Dark Sky Api

Language: Python - Size: 6.86 MB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 7 - Forks: 0

zehengl/scrapy-indeed-company-reviews

A scrapy app to crawl company reviews from Indeed

Language: Python - Size: 63.2 MB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 4 - Forks: 1

hellock/icrawler

A multi-thread crawler framework with many builtin image crawlers provided.

Language: Python - Size: 280 KB - Last synced at: 4 days ago - Pushed at: 24 days ago - Stars: 891 - Forks: 180

chama-45426/hub-api

AI模型接口汇总管理

Language: Go - Size: 31.3 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 1

chebishev/bb-team-info-by-ingredient

Nutrition info scraper (Scrapy > Excel)

Language: Python - Size: 12.2 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

youmeat6678/Instagram-Hashtag-Scraper

A Python project for scraping Instagram posts based on specific hashtags. This tool uses Selenium for browser automation and BeautifulSoup for scraping content from Instagram. It's designed to fetch posts by hashtag, with customizable options to automate login, scrape content, and interact with Instagram posts.

Language: Python - Size: 35.2 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 5 - Forks: 1

ScrapingAnt/scrapingant-client-python

ScrapingAnt API client for Python.

Language: Python - Size: 60.5 KB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 42 - Forks: 5

Solihatun1/AI-Cursor-Scraping-Assistant

A powerful tool that leverages Cursor AI and MCP (Model Context Protocol) to easily generate web scrapers for various types of websites.

Language: Python - Size: 16.6 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 6 - Forks: 0

sazzadhossainmilon/got-scraping-client

got-scraping-client is a lightweight and efficient tool for web scraping tasks using the popular Got HTTP client. It simplifies data extraction from web pages by providing a straightforward API and built-in support for handling common challenges like pagination and rate limiting.

Language: TypeScript - Size: 24.4 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

Huolix/web-scraping-beautifulsoup

A Python project that scrapes product data using BeautifulSoup

Language: Jupyter Notebook - Size: 9.77 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

rheaacharya77/foodmandu-scraper

Foodmandu Scraper extracts restaurant information such as restaurant URLs, images,names,addresses,and cuisines.

Language: Python - Size: 577 KB - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

wuzhy1ng/BlockchainSpider

A toolkit for blockchain data collection

Language: Python - Size: 13.8 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 152 - Forks: 30

amelia05-spec/crowdfunding-real-estate-scrapy

This project is a powerful and extensible scrapy-based crawler designed to extract and aggregate data from multiple real estate crowdfunding platforms. Ideal for investors, analysts and researchers interested in tracking investment opportunities, platform performance and market trends

Language: Python - Size: 31.3 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 1 - Forks: 0

murtaja89/public-proxies

🌐 Public Proxy List (Updated Every 2 Hours)

Language: HTML - Size: 1.12 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 2 - Forks: 0

tb0hdan/domains

World’s single largest Internet domains dataset

Language: HTML - Size: 1.68 GB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 779 - Forks: 129

Yash-Kavaiya/telegram-url-scraper

You can use this tool to export your Telegram user, group, or chat history in JSON format, extract text messages, and it can help you extract all available URLs in Telegram and generate a CSV file for further analysis.

Language: Jupyter Notebook - Size: 569 KB - Last synced at: 5 days ago - Pushed at: 2 months ago - Stars: 13 - Forks: 6

Maestro-111/search-engine

search engine with Django, Scrapy, MongoDB and Elasticsearch

Language: Python - Size: 5.41 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 1 - Forks: 0

akshaysharmajs/Price-Monitoring-Tool

Tool for retrieving cheapest price for any product comparing across e-commerce platforms like Amazon, Flipkart, Snapdeal, Ebay

Language: Python - Size: 1.51 MB - Last synced at: 1 day ago - Pushed at: almost 5 years ago - Stars: 11 - Forks: 3

NeroHin/millions-crawler

Homework III of NCKU course WEB RESOURCE DISCOVERY AND EXPLOITATION , I've used the distribute crawler to crawling over miliion web page.

Language: Python - Size: 681 KB - Last synced at: 12 days ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

eracle/linkedin

Linkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy

Language: Python - Size: 688 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 919 - Forks: 142

eliasdabbas/advertools

advertools - online marketing productivity and analysis tools

Language: Python - Size: 23 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 1,248 - Forks: 230

QueraTeam/dataanalysis_bootcamp_crawler

Web scraper implementations for a variety of websites.

Language: HTML - Size: 117 MB - Last synced at: 15 days ago - Pushed at: over 2 years ago - Stars: 12 - Forks: 34

scrapy-plugins/scrapy-playwright

🎭 Playwright integration for Scrapy

Language: Python - Size: 982 KB - Last synced at: 15 days ago - Pushed at: 5 months ago - Stars: 1,212 - Forks: 140

mouday/spider-admin-pro

spider-admin-pro 一个集爬虫Scrapy+Scrapyd爬虫项目查看和爬虫任务定时调度的可视化管理工具，SpiderAdmin的升级版

Language: Python - Size: 2.82 MB - Last synced at: 11 days ago - Pushed at: 9 months ago - Stars: 601 - Forks: 87

ayoubfaouzi/workspider 📦

Automate job application

Language: Python - Size: 14.6 KB - Last synced at: 5 days ago - Pushed at: over 8 years ago - Stars: 12 - Forks: 7

GRDRarda/search-a-sorted

Search A Sorted offers a quick way to find elements in a sorted list, whether ascending or descending. With functions like `search_a_sorted_ascending()` and `search_a_sorted_descending()`, it efficiently locates your target in no time! 🐙💻

Language: C - Size: 4.88 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

ttonys/Scrapy-CVE-CNVD 📦

漏洞监控，基于scrapy，scrapy-redis，获取每日最新的CVE和CNVD漏洞，邮件通知

Language: Python - Size: 260 KB - Last synced at: 12 days ago - Pushed at: almost 3 years ago - Stars: 207 - Forks: 38

VyacheslavGizov/scrapy_parser_pep

Парсер документов PEP

Language: Python - Size: 63.5 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

juancarlospaco/faster-than-requests

Faster requests on Python 3

Language: Nim - Size: 19.9 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 1,121 - Forks: 91

Telegram-Scraper2025/Telegram-Scraper-Member-Adder-Forwarder-And-More

A powerful Python script, the best marketing tool in 2025, that allows you to scrape messages and media from Telegram channels and groups. Features include real-time scraping, media download and data export capabilities.

Language: Python - Size: 18.9 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 1 - Forks: 0

Erik172/stylos-scrapers

Advanced Fashion Scraper | Stylos Ecosystem. Intelligent extraction of products, prices, and images from fashion retailers like Zara and Mango. Part of the Stylos ecosystem, an AI-powered platform for trend analysis and personalized style recommendations. Built with Scrapy, Selenium, and MongoDB.

Language: Python - Size: 21.5 MB - Last synced at: 9 days ago - Pushed at: 18 days ago - Stars: 2 - Forks: 0

shengchenyang/AyugeSpiderTools

使 scrapy 开发不用在意 item，pipeline，middleware 等通用场景下模块的编写，解放开发者的双手。

Language: Python - Size: 26 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 93 - Forks: 15

jasonren0403/app_crawler

基于scrapy的应用商店爬虫，包括应用信息本身及其评论

Language: Python - Size: 141 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 4 - Forks: 0

EspiraMarvin/scraping-techniques

Code & Tutorial for Cloud Sigma tutorial where I demonstrate various ways one can scrape web data with different python libraries

Language: Python - Size: 24.2 MB - Last synced at: 17 days ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 1

ivan-sincek/scrapy-scraper

Web crawler and scraper based on Scrapy and Playwright's headless browser.

Language: Python - Size: 86.9 KB - Last synced at: 4 days ago - Pushed at: 4 months ago - Stars: 16 - Forks: 4

666zyb/Real-time_stock_analysis

这是我结合ai开发的一个基于Python的股票分析与自动交易系统，集成了实时数据采集、技术指标分析、新闻情感分析、凯利公式仓位管理和自动化交易等功能。系统采用Django作为Web框架，提供直观的用户界面，支持多股票实时监控和自动化交易决策。

Language: Python - Size: 28.2 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 23 - Forks: 6

andros21/pgrank

pgrank - cpp app for computing pagerank

Language: C++ - Size: 668 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

xpetz/Netflix-Clone

# Netflix-CloneThis project replicates the Netflix homepage using only HTML and CSS, focusing on responsive design. Future updates will include JavaScript features to enhance interactivity. 🛠️🌐

Language: HTML - Size: 5.42 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

internetarchive/scrapy-warcio

Support for writing WARC files with Scrapy

Language: Python - Size: 31.3 KB - Last synced at: 9 days ago - Pushed at: over 5 years ago - Stars: 23 - Forks: 6

shadilytn/Autonomy

This Google Colab notebook demonstrates a complete workflow for extracting information from a website, processing it, and preparing it for use in applications like question answering or semantic search.

Language: Jupyter Notebook - Size: 49.8 KB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 1 - Forks: 0

TeamHG-Memex/scrapy-crawl-once

Scrapy middleware which allows to crawl only new content

Language: Python - Size: 14.6 KB - Last synced at: 18 days ago - Pushed at: over 2 years ago - Stars: 80 - Forks: 23

nit-in/pib

Download articles by the Press Information Bureau, India follow the instructions or download by month from the releases section

Language: Python - Size: 176 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 5 - Forks: 2

nit-in/download_ncert_books

download NCERT books using scrapy

Language: Python - Size: 18.1 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 7 - Forks: 3

gaalcaras/mailingListScraper

A python web scraper for public email lists.

Language: Python - Size: 140 KB - Last synced at: 10 days ago - Pushed at: about 7 years ago - Stars: 34 - Forks: 13

AccordBox/awesome-scrapy

A curated list of awesome packages, articles, and other cool resources from the Scrapy community.

Size: 51.8 KB - Last synced at: 13 days ago - Pushed at: over 2 years ago - Stars: 551 - Forks: 64

KenMwaura1/nse-stock-scraper

This is Web Scraper utilizing Scrapy Framework, MongoDB and AfricasTalking to get stock prices for companies listed on the Nairobi Stock Exchange. This project will store ticker name and price as well notify via SMS once properly setup via AfricasTalking.

Language: Python - Size: 368 KB - Last synced at: 2 days ago - Pushed at: about 2 years ago - Stars: 15 - Forks: 2