An open API service providing repository metadata for many open source software ecosystems.

Topic: "web-crawling"

apify/crawlee

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

Language: TypeScript - Size: 140 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 17,594 - Forks: 810

apify/crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

Language: Python - Size: 27.4 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 5,616 - Forks: 379

omkarcloud/botasaurus

The All in One Framework to Build Undefeatable Scrapers

Language: Python - Size: 62.5 MB - Last synced at: about 1 hour ago - Pushed at: about 17 hours ago - Stars: 1,877 - Forks: 164

cxcscmu/Craw4LLM

Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"

Language: Python - Size: 79.1 KB - Last synced at: 30 days ago - Pushed at: 3 months ago - Stars: 608 - Forks: 56

crwlrsoft/crawler

Library for Rapid (Web) Crawler and Scraper Development

Language: PHP - Size: 1.17 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 360 - Forks: 13

scrapehero-code/amazon-scraper

A simple web scraper to extract Product Data and Pricing from Amazon

Language: Python - Size: 16.6 KB - Last synced at: 11 months ago - Pushed at: almost 2 years ago - Stars: 307 - Forks: 156

jrbadiabo/Bet-on-Sibyl

Machine Learning Model for Sport Predictions (Football, Basketball, Baseball, Hockey, Soccer & Tennis)

Language: Jupyter Notebook - Size: 17 MB - Last synced at: 29 days ago - Pushed at: about 8 years ago - Stars: 263 - Forks: 94

TurnerSoftware/InfinityCrawler

A simple but powerful web crawler library for .NET

Language: C# - Size: 326 KB - Last synced at: 12 days ago - Pushed at: over 1 year ago - Stars: 251 - Forks: 37

godkingjay/selenium-twitter-scraper

This is a Twitter Scraper which uses Selenium for scraping tweets. It is capable of scraping tweets from home, user profile, hashtag, query or search, and advanced searches.

Language: Jupyter Notebook - Size: 160 KB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 246 - Forks: 63

spyboy-productions/omnisci3nt

Omnisci3nt – See What They’ve Tried to Hide Extract deep intelligence from any domain. From subdomains to SSL certs, archived secrets to exposed ports — Omnisci3nt gives you the full picture in seconds.

Language: Python - Size: 8.21 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 235 - Forks: 38

ayakashi-io/ayakashi

:zap: Ayakashi.io - The next generation web scraping framework

Language: TypeScript - Size: 1.24 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 213 - Forks: 8

serpapi/clauneck

A tool for scraping emails, social media accounts, and much more information from websites using Google Search Results.

Language: Ruby - Size: 34.2 KB - Last synced at: about 17 hours ago - Pushed at: about 1 year ago - Stars: 178 - Forks: 11

scrapinghub/scrapy-training

Scrapy Training companion code

Language: Python - Size: 103 KB - Last synced at: 17 days ago - Pushed at: over 6 years ago - Stars: 174 - Forks: 45

brianmadden/krawler

A web crawling framework written in Kotlin

Language: Kotlin - Size: 403 KB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 130 - Forks: 16

my8100/scrapyd-cluster-on-heroku

Set up free and scalable Scrapyd cluster for distributed web-crawling with just a few clicks. DEMO :point_right:

Language: Python - Size: 236 KB - Last synced at: about 1 month ago - Pushed at: about 5 years ago - Stars: 122 - Forks: 88

fintech-hub/bancocentralbrasil

💵 💰 :brazil: Informações sobre taxas oficiais diárias de Inflação, Selic, Poupança, Dólar, Dólar PTAX, Euro e Euro PTAX pelo site do Banco Central do Brasil

Language: Python - Size: 182 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 120 - Forks: 34

MaxValue/Terpene-Profile-Parser-for-Cannabis-Strains

Parser and database to index the terpene profile of different strains of Cannabis from online databases

Language: Python - Size: 21.4 MB - Last synced at: 19 days ago - Pushed at: about 2 years ago - Stars: 118 - Forks: 18

maxmindlin/scout-lang

A web crawling programming language

Language: Rust - Size: 54.4 MB - Last synced at: 6 days ago - Pushed at: 9 months ago - Stars: 112 - Forks: 6

jonasjacek/robots.txt

Simple robots.txt template. Keep unwanted robots out (disallow). White lists (allow) legitimate user-agents. Useful for all websites.

Size: 135 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 87 - Forks: 38

alyakhtar/Katastrophe

Command Line Tool to download torrents

Language: Python - Size: 322 KB - Last synced at: 6 months ago - Pushed at: over 8 years ago - Stars: 85 - Forks: 12

ScrapingAnt/amazon_scraper

Amazon products scraper with using of rotating proxies and headless Chrome from ScrapingAnt

Language: JavaScript - Size: 52.7 KB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 81 - Forks: 19

GoTrained/Scrapy-Craigslist

Web Scraping Craigslist's Engineering Jobs in NY with Scrapy

Language: Python - Size: 195 KB - Last synced at: 4 months ago - Pushed at: almost 8 years ago - Stars: 66 - Forks: 37

spyboy-productions/PhantomCrawler

Boost website hits by generating requests from multiple proxy IPs.

Language: Python - Size: 1.48 MB - Last synced at: 22 days ago - Pushed at: about 1 year ago - Stars: 64 - Forks: 9

dongweiming/daenerys

Scraping and Web Crawling Framework For Zhihu Live

Language: Python - Size: 13.7 KB - Last synced at: 21 days ago - Pushed at: over 7 years ago - Stars: 63 - Forks: 30

jgujerry/python-frameworks

Another curated list of Python frameworks

Language: Python - Size: 13 MB - Last synced at: 24 days ago - Pushed at: 25 days ago - Stars: 61 - Forks: 4

MohamedHmini/tweetsOLAPing

implementing an end-to-end tweets ETL/Analysis pipeline.

Language: Python - Size: 5.99 MB - Last synced at: 22 days ago - Pushed at: over 2 years ago - Stars: 57 - Forks: 6

ScaleUnlimited/flink-crawler

Continuous scalable web crawler built on top of Flink and crawler-commons

Language: Java - Size: 1.38 MB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 52 - Forks: 18

SoheilKhodayari/JAW

JAW: A Graph-based Security Analysis Framework for Client-side JavaScript

Language: JavaScript - Size: 43.1 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 47 - Forks: 7

sushantPatrikar/Amazon-Flipkart-Price-Comparison-Engine

Compares price of the product entered by the user from e-commerce sites Amazon and Flipkart :moneybag: :bar_chart:

Language: Python - Size: 7.77 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 43 - Forks: 30

mike-gee/webtranspose

Web scraping API for building AI applications.

Language: Python - Size: 1.43 MB - Last synced at: 13 days ago - Pushed at: over 1 year ago - Stars: 41 - Forks: 2

luminati-io/brightdata-mcp

A powerful Model Context Protocol (MCP) server that provides an all-in-one solution for public web access.

Language: JavaScript - Size: 63.8 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 40 - Forks: 4

chrislicodes/Udacity-Data-Analyst-Nanodegree

Repository for the projects needed to complete the Data Analyst Nanodegree.

Language: Jupyter Notebook - Size: 93.1 MB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 34 - Forks: 22

Cheng-Lin-Li/KnowledgeGraph

This repository for Web Crawling, Information Extraction, and Knowledge Graph build up.

Language: Julia - Size: 5.19 MB - Last synced at: 8 days ago - Pushed at: about 7 years ago - Stars: 33 - Forks: 4

ScrapingAnt/zoominfo_scraper

Zoominfo scraper with using of rotating proxies and headless Chrome from ScrapingAnt

Language: Python - Size: 7.81 KB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 32 - Forks: 9

zytedata/spidyquotes

Example site for web scraping tutorials

Language: Julia - Size: 229 KB - Last synced at: 2 months ago - Pushed at: 7 months ago - Stars: 31 - Forks: 18

HRN-Projects/amazon-captcha-solver

A TensorFlow (Deep Learning - CNN) based solution for tackling captcha when collecting data from Amazon.

Language: Python - Size: 35.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 29 - Forks: 13

kapilkchaurasia/Data-mining-python-script

It contain various script on web crawling/ data mining of social web(RSS,facebook,twitter,Linkedin)

Language: Python - Size: 121 KB - Last synced at: over 1 year ago - Pushed at: over 10 years ago - Stars: 26 - Forks: 19

omkarcloud/botasaurus-starter

🚀 OFFICIAL STARTER TEMPLATE FOR BOTASAURUS SCRAPING FRAMEWORK 🤖

Language: TypeScript - Size: 397 KB - Last synced at: about 1 hour ago - Pushed at: 8 days ago - Stars: 25 - Forks: 8

yuis-ice/jseval

Evaluate JavaScript on a URL through headless Chrome browser.

Language: JavaScript - Size: 2.93 KB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 25 - Forks: 1

miroshnikov/scrapyteer

Web crawling & scraping framework for Node.js on top of headless Chrome browser

Language: TypeScript - Size: 384 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 19 - Forks: 0

rohitthapliyal2000/Amazon-Mobile-Sentiment-Analysis

Opinion mining of Mobile reviews on Amazon platform

Language: Python - Size: 146 KB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 19 - Forks: 2

leopardslab/CrawlerX

CrawlerX - Develop Extensible, Distributed, Scalable Crawler System which is a web platform that can be used to crawl URLs in different kind of protocols in a distributed way.

Language: SCSS - Size: 11.8 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 18 - Forks: 13

tal95shah/OLX_Scraper

:radio: An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.

Language: Python - Size: 127 KB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 17 - Forks: 7

ScrapingAnt/alibaba_scraper

Alibaba scraper with using of rotating proxies and headless Chrome from ScrapingAnt

Language: Python - Size: 152 KB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 16 - Forks: 3

SuperBruceJia/dynamic-web-crawlering-python

This repo is mainly for dynamic web (Ajax Tech) crawling using Python, taking China's NSTL websites as an example.

Language: Python - Size: 12.8 MB - Last synced at: 21 days ago - Pushed at: almost 2 years ago - Stars: 16 - Forks: 3

dchrostowski/autoproxy

Public proxy farm that automatically records and queues suitable proxy servers for web crawling

Language: Python - Size: 401 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 16 - Forks: 5

KadekM/scrawler

Scala web crawling and scraping using fs2 streams

Language: HTML - Size: 92.8 KB - Last synced at: 5 days ago - Pushed at: over 7 years ago - Stars: 16 - Forks: 3

mirkomantovani/web-search-engine-UIC

CS 582 Information Retrieval at University of Illinois at Chicago. Multithreaded crawling of UIC domain, inverted index, page rank, SEO with Context Pseudo-Relevance Feedback

Language: Python - Size: 104 MB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 14 - Forks: 4

thesp0nge/nightcrawler-mitm

A python program that crawls a website and tries to stress it, polluting forms with bogus data

Language: Python - Size: 247 KB - Last synced at: 12 days ago - Pushed at: about 1 month ago - Stars: 13 - Forks: 1

omkarcloud/omkar-temp-mail

🚀 OMKAR TEMP MAIL HELPS YOU USE TEMPORARY EMAILS. 🤖

Language: Python - Size: 15.6 KB - Last synced at: about 1 hour ago - Pushed at: about 1 year ago - Stars: 13 - Forks: 4

INNOVINATI/microwler

A micro-framework for asynchronous deep crawls and web scraping with Python

Language: Python - Size: 1.49 MB - Last synced at: 26 days ago - Pushed at: almost 2 years ago - Stars: 13 - Forks: 1

HuberTRoy/Seen

A lightweight crawling/spider framework for everyone(support JavaScript!).:sparkles:

Language: Python - Size: 82 KB - Last synced at: 22 days ago - Pushed at: almost 7 years ago - Stars: 13 - Forks: 3

supergillis/crawler-ts

Crawler written in TypeScript using ES6 generators.

Language: TypeScript - Size: 60.5 KB - Last synced at: 9 days ago - Pushed at: about 4 years ago - Stars: 12 - Forks: 1

jonathandunn/common_crawl_corpus

Scripts for building a geo-located web corpus using Common Crawl data

Language: Python - Size: 323 KB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 11 - Forks: 0

leolle/deep_learning

projects about NLP knowledge graph, web crawling, word embedding, entity&relation extraction.

Language: Jupyter Notebook - Size: 299 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 11 - Forks: 8

crwlrsoft/robots-txt

Robots Exclusion Standard/Protocol Parser for Web Crawling/Scraping

Language: PHP - Size: 32.2 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 10 - Forks: 2

dstark5/gnews-scraper

GNewsScraper is a TypeScript package that scrapes article data from Google News based on a keyword or phrase. It returns the results as an array of JSON objects, making it convenient to access and use the scraped information

Language: TypeScript - Size: 153 KB - Last synced at: 22 days ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 3

IBMDeveloperUK/Golang-Web-Scraping

Learn how to scrape web content from HTML and see how web scraping differs to web crawling

Language: Go - Size: 22.4 MB - Last synced at: 11 months ago - Pushed at: over 4 years ago - Stars: 10 - Forks: 19

my8100/scrapyd-cluster-on-heroku-scrapyd-app

How to set up Scrapyd cluster on Heroku

Language: Python - Size: 26.4 KB - Last synced at: 29 days ago - Pushed at: about 3 years ago - Stars: 9 - Forks: 29

Boomslet/Web_Crawler

Open-source web crawler

Language: Python - Size: 34.2 KB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 9 - Forks: 6

arthur3486/born2crawl

A highly performant and versatile crawling engine, designed with scalability and extensibility in mind.

Language: Kotlin - Size: 624 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 0

talaatmagdyx/socials_regex

🪡 Social account detection and extraction in ruby, e.g. for crawling/scraping.

Language: Ruby - Size: 45.9 KB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 1

andredarcie/best-games-of-all-time-data-based

🏆 Definite Best Games Of All Time Data Based by multiple sources

Language: Python - Size: 10.2 MB - Last synced at: 14 days ago - Pushed at: about 3 years ago - Stars: 8 - Forks: 0

Changwanseo/GenMine

GenBank Record downloader for taxonomists

Language: Python - Size: 439 KB - Last synced at: about 20 hours ago - Pushed at: 6 months ago - Stars: 7 - Forks: 0

PPS-22-Scooby/PPS-22-Scooby

Scala application that allows web crawling and web scraping of web pages given as input with the use of special rules passed to it through the use of a DSL.

Language: Scala - Size: 4.3 MB - Last synced at: 22 days ago - Pushed at: 9 months ago - Stars: 7 - Forks: 1

joe-stifler/crawler

Crawler is a Python package that crawls web pages and converts their content into Markdown format, making it easy to create documentation, notes, or other text-based representations. It features domain restrictions, flexible output options, and graph visualization.

Language: Python - Size: 283 KB - Last synced at: 22 days ago - Pushed at: 11 months ago - Stars: 7 - Forks: 1

0MeMo07/Web-Crawler

Web Crawler with Python

Language: Python - Size: 8.79 KB - Last synced at: 18 days ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 0

omkarcloud/web-scraping-template

🚀 THIS WEB SCRAPING TEMPLATE PROVIDES YOU WITH A GREAT STARTING POINT WHEN CREATING WEB SCRAPING BOTS. 🤖

Language: Python - Size: 104 KB - Last synced at: about 1 hour ago - Pushed at: almost 2 years ago - Stars: 7 - Forks: 3

lewisakura/spiderboi

A web crawling library written in TypeScript.

Language: TypeScript - Size: 376 KB - Last synced at: 30 days ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 1

michaelradu/web-crawler

A Web Crawler developed in Python.

Language: Python - Size: 6.84 KB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 7 - Forks: 2

sunil-sandhu/scrawly

Package wrapper around Node.js and Puppeteer for web crawling/scraping. Originally put together to accompany an article that can be found here: https://sunilsandhu.com/posts/how-to-scrape-data-from-a-website-with-javascript

Language: JavaScript - Size: 6.84 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 7 - Forks: 5

wangjksjtu/Data-Mining-51Job

Data-mining on 51Job website

Language: Jupyter Notebook - Size: 7.76 MB - Last synced at: 3 days ago - Pushed at: almost 7 years ago - Stars: 7 - Forks: 1

lekhmanrus/real-shot-pdf

RealShotPDF is a Chrome extension designed to simplify the process of creating PDF documents from web content. The extension allows users to navigate through selected webpages, parse and display links in a tree view, and generate PDFs for the chosen pages. It operates locally without sending any data to external servers.

Language: TypeScript - Size: 406 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 1

omkarcloud/puppeteer-captcha-solving-tutorial

🚀 LEARN HOW TO SOLVE CAPTCHA IN PUPPETEER USING CAPSOLVER 🤖

Language: Python - Size: 2.38 MB - Last synced at: about 1 hour ago - Pushed at: almost 2 years ago - Stars: 6 - Forks: 1

omkarcloud/selenium-2captcha-recaptcha-solver-demo

🚀 FINAL CODE FOR TUTORIAL ON HOW TO SOLVE CAPTCHA IN SELENIUM USING 2CAPTCHA 🤖

Language: Python - Size: 5.86 KB - Last synced at: about 1 hour ago - Pushed at: almost 2 years ago - Stars: 6 - Forks: 2

ahmed-alnassif/net-spider

Net-Spider is a web scraping tool designed to retrieve the source code for a web page, including front-end elements such as JavaScript, CSS, images, and fonts. It allows you to crawl and download the source code from a target website.

Language: Python - Size: 2.65 MB - Last synced at: 28 days ago - Pushed at: 11 months ago - Stars: 5 - Forks: 1

Bazserpz/nstbrowser-automation-library

NSTBrowser is an advanced browser for web scraping and automation, offering proxy management and anti-detect features. Compatible with Puppeteer, Playwright, and Selenium, it excels in multi-accounting and bypassing web protections.

Size: 4.88 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

samedog/PHPmvs

PHPmvs is an old tool i wrote to test common web apps and server vulnerabilities

Language: PHP - Size: 39.1 KB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 5 - Forks: 3

JizhiziLi/all-kinds-crawling-tools

This repository provides all kinds of crawling tools, e.g. image-crawler, paper-crawler

Language: Python - Size: 7.81 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 5 - Forks: 1

Abhishek1103/EtherSamaj

A JAVA based Decentralised desktop app (Dapp) for Community Work , funding and Medical Funding. This application currently works on the Infura test network which mimics the original ethereum blockchain network.

Language: Java - Size: 2.56 MB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 5 - Forks: 2

prakharchoudhary/fun_with_python

My adventures with python!!

Language: Jupyter Notebook - Size: 2.67 MB - Last synced at: about 1 month ago - Pushed at: almost 7 years ago - Stars: 5 - Forks: 1

vladimanaev/web-spider

web crawler allowing full page render crawl using HtmlUnit

Language: Java - Size: 48.8 KB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 5 - Forks: 0

Fern-Aerell/Web-Crawling-To-TXT

A simple web crawling application that can browse URLs, extract text content, and save the results in TXT format.

Language: Python - Size: 315 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 4 - Forks: 0

Solrikk/DataDigger

DataDigger is a powerful and intuitive web application designed to extract and analyze data from web pages.

Language: Go - Size: 38.1 KB - Last synced at: 26 days ago - Pushed at: 2 months ago - Stars: 4 - Forks: 0

breck7/crawlers

Crawlers for extracting measurements from the web for Scroll datasets

Language: JavaScript - Size: 140 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 4 - Forks: 0

crwlrsoft/laravel-crawler

Laravel adapter for the crwlr/crawler package.

Language: PHP - Size: 8.79 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 0

haritonch/instagrammer

Yet Another Instagram Bot.

Language: Python - Size: 6.9 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 0

HRN-Projects/common_crawl_with_scrapy

Parsing Huge Web Archive files from Common Crawl data index to fetch any required domain's data concurrently with Python and Scrapy.

Language: Python - Size: 23.9 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 4 - Forks: 5

sgalal/lshk-word-list-crawler 📦

Crawler for Cantonese pronunciation data on LSHK Jyutping Word List (香港語言學學會粵拼詞表)

Language: Python - Size: 240 KB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 4 - Forks: 2

ansegura7/WebScraping_Covid19

Web Scraping project to obtain data on confirmed cases and deaths of Covid-19, in order to analyze them.

Language: HTML - Size: 23.7 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 4 - Forks: 1

cs-fedy/reddit-crawler

I'm crawling reddit website, and i want to store them in a database(postgresql maybe).

Language: Python - Size: 16.6 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 4 - Forks: 0

asanakoy/web-crawler

Web-Crawler for simple.wikipedia.org on C++

Language: C++ - Size: 24.6 MB - Last synced at: over 1 year ago - Pushed at: about 11 years ago - Stars: 4 - Forks: 1

afrontend/dongnelibrary

도서관 책을 빌릴 수 있는지 확인하는 유틸리티

Language: JavaScript - Size: 438 KB - Last synced at: 1 day ago - Pushed at: 11 months ago - Stars: 3 - Forks: 0

ElektroStudios/FHM-Crawler-freehardmusic.com

Crawls download urls of albums from freehardmusic.com website

Language: Visual Basic .NET - Size: 10.5 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 2

Mirtia/Inappropriate-YouTube 📦

This repository contains the scripts used to obtain channel YouTube features and analyze potential disturbing channels for the publication "YouTubers Not madeForKids: Detecting Channels Sharing Inappropriate Videos Targeting Children".

Language: Python - Size: 9.04 MB - Last synced at: 6 days ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 0

Cy8erEgo/2ch-fap-finder

2ch-fap-finder allows to find fap threads on 2ch.hk

Language: Python - Size: 13.7 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

omar-elmaria/python_scrapy_airflow_pipeline

This repo contains a full-fledged Python-based script that scrapes a JavaScript-rendered website, cleans the data, and pushes the results to a cloud-based database. The workflow is orchestrated on Airflow to run automatically

Language: Python - Size: 179 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

yanbin43/Crawler-Selenium

Web crawler demo with python Selenium

Language: Jupyter Notebook - Size: 314 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 0

it21208/Text-Processing-ETL-and-Machine-Learning-for-Newslines

𝐓𝐡𝐞 𝐩𝐮𝐫𝐩𝐨𝐬𝐞 𝐨𝐟 𝐭𝐡𝐞 𝐜𝐨𝐝𝐞 𝐢𝐧 𝐭𝐡𝐢𝐬 𝐫𝐞𝐩𝐨𝐬𝐢𝐭𝐨𝐫𝐲 𝐢𝐬 𝐟𝐨𝐫 𝐝𝐞𝐦𝐨𝐧𝐬𝐭𝐫𝐚𝐭𝐢𝐨𝐧 𝐨𝐧𝐥𝐲, 𝐭𝐡𝐞 𝐬𝐜𝐫𝐢𝐩𝐭𝐬 𝐛𝐲 𝐭𝐡𝐞𝐦𝐬𝐞𝐥𝐯𝐞𝐬 𝐝𝐨 𝐧𝐨𝐭 𝐝𝐨 𝐚𝐧𝐲𝐭𝐡𝐢𝐧𝐠, 𝐛𝐞𝐜𝐚𝐮𝐬𝐞 𝐩𝐮𝐫𝐩𝐨𝐬𝐞𝐥𝐲 𝐦𝐚𝐧𝐲 𝐨𝐭𝐡𝐞𝐫 𝐟𝐨𝐥𝐝𝐞𝐫𝐬 𝐚𝐧𝐝 𝐝𝐚𝐭𝐚 𝐟𝐢𝐥𝐞𝐬 𝐚𝐫𝐞 𝐧𝐨𝐭 𝐢𝐧𝐜𝐥𝐮𝐝𝐞𝐝 𝐢𝐧 𝐨𝐫𝐝𝐞𝐫 𝐭𝐨 𝐧𝐨𝐭 𝐯𝐢𝐨𝐥𝐚𝐭𝐞 𝐚𝐧𝐲 𝐈𝐏𝐎 𝐨𝐫 𝐩𝐫𝐢𝐯𝐚𝐭𝐞 𝐝𝐚𝐭𝐚 𝐨𝐟 𝐚𝐧 𝐨𝐫𝐠𝐚𝐧𝐢𝐬𝐚𝐭𝐢𝐨𝐧. 𝐇𝐨𝐰𝐞𝐯𝐞𝐫, 𝐭𝐡𝐞 𝐜𝐨𝐝𝐞 𝐢𝐬 𝐬𝐭𝐢𝐥𝐥 𝐮𝐬𝐞𝐟𝐮𝐥 𝐟𝐨𝐫 𝐭𝐞𝐱𝐭 𝐩𝐫𝐨𝐜𝐞𝐬𝐬𝐢𝐧𝐠 𝐢𝐝𝐞𝐚𝐬 𝐚𝐧𝐝 𝐨𝐩𝐞𝐧 𝐭𝐨 𝐭𝐡𝐞 𝐩𝐮𝐛𝐥𝐢𝐜, 𝐭𝐨 𝐚𝐬𝐬𝐢𝐬𝐭 𝐚𝐧𝐝 𝐦𝐨𝐭𝐢𝐯𝐚𝐭𝐞 𝐚𝐧𝐲𝐨𝐧𝐞 𝐢𝐧𝐭𝐞𝐫𝐞𝐬𝐭𝐞𝐝 𝐢𝐧 𝐭𝐞𝐱𝐭 𝐩𝐫𝐨𝐜𝐞𝐬𝐬𝐢𝐧𝐠.

Language: Python - Size: 21.2 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 0

18520339/web-scraping-with-scrapy

Python web scraping with Scrapy

Language: Python - Size: 479 KB - Last synced at: 3 days ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 0