GitHub topics: crawling-python

Repositories

cnick0337/download-instagram-photos

📸 Download your Instagram photos quickly and easily, including posts and optional Reels thumbnails, with this lightweight automation tool.

Size: 2.96 MB - Last synced at: about 9 hours ago - Pushed at: about 11 hours ago - Stars: 0 - Forks: 0

watercrawl/WaterCrawl

Transform Web Content into LLM-Ready Data

Language: TypeScript - Size: 4.8 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1,522 - Forks: 163

PREAKP90/Python_Wallpaper_Crawler

Wallpaper Crawler is an advanced web scraping tool designed to crawl websites and download high-resolution wallpapers.

Size: 1000 Bytes - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

D4Vinci/Scrapling

🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

Language: Python - Size: 4.01 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 8,137 - Forks: 465

NickG1978/awesome-web-crawler

🕷️ Discover and use popular web crawlers across various programming languages to efficiently extract data from the web.

Language: HTML - Size: 1.66 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

MarshalX/telegram-crawler

🕷 Automatically detect changes made to the official Telegram sites, clients and servers.

Language: Python - Size: 836 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 330 - Forks: 43

zhouyi207/WeiBoCrawler

微博数据采集，微博爬虫，微博网页解析，完整代码（主体内容+评论内容）

Language: Python - Size: 20.1 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 90 - Forks: 11

lorien/awesome-web-scraping

List of libraries, tools and APIs for web scraping and data processing.

Language: Makefile - Size: 427 KB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 7,423 - Forks: 827

scrapfly/scrapfly-scrapers

Scalable Python web scraping scripts for +40 popular domains

Language: Python - Size: 6.61 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 744 - Forks: 161

thewebscraping/tls-requests

TLS Requests is a powerful Python library for secure HTTP requests, offering browser-like TLS client, fingerprinting, anti-bot page bypass, and high performance.

Language: Python - Size: 3.71 MB - Last synced at: 23 days ago - Pushed at: about 1 month ago - Stars: 90 - Forks: 7

Musubi-ai/Musubi

Musubi: A convenient crawling tool for collecting web text data in Python.

Language: Python - Size: 1.04 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 1 - Forks: 0

TechmoNoway/hotel-price-prediction-analysis

This project analyzes and predicts hotel prices using real-world data from Agoda and Machine Learning techniques.

Language: Jupyter Notebook - Size: 1.91 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

omkarcloud/botasaurus-starter

🚀 OFFICIAL STARTER TEMPLATE FOR BOTASAURUS SCRAPING FRAMEWORK 🤖

Language: TypeScript - Size: 402 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 27 - Forks: 9

Haimonmon/snippy

A Book scraping bot that ables to give you books data, but be cautious as may result this a banning of your ip.

Language: Python - Size: 429 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

NewsCrap adalah alat scraping berita Google berbasis Command Line Interface (CLI) yang dirancang untuk riset, investigasi, dan pengumpulan data OSINT. Dengan fitur canggih seperti rotation proxy, scheduling otomatis, dan multi-format export, alat ini memudahkan pengumpulan data berita secara efisien dan andal.

Language: Python - Size: 35.2 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 53 - Forks: 12

ilovedevs/awesome-web-crawler

List of best web crawlers to extract data from the web. Find web crawling tools for different needs.

Size: 167 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

WwwwwyDev/crawlist

A universal solution for web crawling lists. 抓取网页列表的通用解决方案

Language: Python - Size: 1.12 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 110 - Forks: 1

WwwwwyDev/crawlipt

The script for selenium in python. Make automated testing easier! 使用json脚本驱动selenium

Language: Python - Size: 343 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 155 - Forks: 2

spicyparrot/kafka_scrapy_connect

A custom library that integrates Scrapy with Kafka.

Language: Python - Size: 59.6 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 1

helviojunior/filecrawler

File Crawler index files and search hard-coded credentials

Language: Python - Size: 26.4 MB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 34 - Forks: 10

hesamz3090/Moss

Moss is a lightweight, efficient, and modular web crawler designed to explore, analyze, and extract data from the vast landscape of the internet.

Language: Python - Size: 23.4 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

wael-sudo2/facebook-page-info-scraper

Free Facebook pages MetaData Scraping Library - Unlimited Calls

Language: Python - Size: 106 KB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 40 - Forks: 8

omkarcloud/web-scraping-template

🚀 THIS WEB SCRAPING TEMPLATE PROVIDES YOU WITH A GREAT STARTING POINT WHEN CREATING WEB SCRAPING BOTS. 🤖

Language: Python - Size: 104 KB - Last synced at: 23 days ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 4

hhtrieu0108/Ohitv_End_To_End_Project

End-to-End ETL Pipeline for Film Data Crawling from Ohitv

Language: Python - Size: 5.57 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 3 - Forks: 1

Galarzaa90/tibia.py

API to parse tibia.com content into python objects.

Language: Python - Size: 7.65 MB - Last synced at: 5 days ago - Pushed at: 2 months ago - Stars: 40 - Forks: 12

thaoshibe/crawl-original-google-images

python scripts for crawling original image from Google Images

Language: Python - Size: 15.6 KB - Last synced at: 18 days ago - Pushed at: over 3 years ago - Stars: 23 - Forks: 3

adedex/Linkedin-Custom

Enhance your LinkedIn job search with the Linkedin-Custom Chrome extension. Highlight job listings and descriptions using your own keywords. 🌐✨

Language: JavaScript - Size: 36.1 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

xishandong/Android_reverse

此项目分享安卓逆向的实战案例以及学习笔记，适合新手学习，随着作者逐渐变成大神，这个仓库也会适合大神学习~

Language: Python - Size: 17.6 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 57 - Forks: 18

tori1624/property-crawling

부동산 관련 데이터 크롤링 프로젝트

Language: Python - Size: 114 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

baokhanh546123/Gemini-ChatBot-SaleBot

Language: Python - Size: 124 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

erhangundogan/spider

Spider crawling the web

Language: Python - Size: 183 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

shaohua0116/ICLR2019-OpenReviewData

Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.

Language: Jupyter Notebook - Size: 54.4 MB - Last synced at: 4 months ago - Pushed at: almost 6 years ago - Stars: 387 - Forks: 30

M-Taghizadeh/Dollar_Rial_Price_Dataset

In this dataset, the price of the dollar to the Iranian rial in the years 2011 to 2023 has been collected by our crawler.

Language: Python - Size: 53.7 KB - Last synced at: 6 months ago - Pushed at: over 2 years ago - Stars: 25 - Forks: 1

proxymesh/scrapy-proxy-headers

Handle custom proxy headers when making HTTPS requests through proxies in scrapy

Language: Python - Size: 10.7 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 2 - Forks: 0

fernandod1/Instagram-downloader

Instagram user's photos and videos downloader. Download all media files from any username. Working 2022!

Language: Python - Size: 14.6 KB - Last synced at: 7 months ago - Pushed at: over 3 years ago - Stars: 72 - Forks: 16

jayeshthk/ArachnoScan-Framework

Visual Web Pathfinder with Security Analysis Pipeline...Further Penn-test agent.

Language: Python - Size: 3.37 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

golh30/seat-availability-predictor

Real-time tracking and predictive analytics for bus seat availability using Streamlit and PostgreSQL.

Language: Python - Size: 948 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

ptthanh02/VietNam-News-Crawler

Python-based web scraping tool for extracting articles from VietNamNet

Language: Jupyter Notebook - Size: 229 KB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

muhfalihr/PyXDTeleBot

PyXDTeleBot is a Telegram bot created using the Python programming language, specifically designed to facilitate the seamless sharing of media such as photos and videos from Twitter user posts.

Language: Python - Size: 57.6 KB - Last synced at: 7 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

patrik-fredon/Python_Wallpaper_Crawler

Wallpaper Crawler is an advanced web scraping tool designed to crawl websites and download high-resolution wallpapers.

Language: Python - Size: 44.9 KB - Last synced at: 2 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Esequiel378/proxy_randomizer

This library helps you sfetly crawle apis and web pages

Language: HTML - Size: 85 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 0

mike-gee/webtranspose

Web scraping API for building AI applications.

Language: Python - Size: 1.43 MB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 41 - Forks: 2

TufayelLUS/zefix.ch-sogc-web-scraper-in-python

This python script allows scraping data from https://zefix.ch/en/search/shab/welcome to excel file for collecting LinkedIn profile in the future

Language: Python - Size: 0 Bytes - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

SolarStormLab/GFZCrawler

Language: Python - Size: 26.8 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

ls-saurabh/webcrawl

Webcrawl is a Python web crawler that recursively follows links from a starting URL to extract and print unique HTTP links. Using 'requests and 'BeautifulSoup', it avoids revisits, handles errors, and supports configurable crawling depth. Ideal for gathering and analyzing web links.

Language: Python - Size: 17.6 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 1 - Forks: 0

kisoo95/Naver-cafe-crawling-ver240115

Naver cafe crawling using search keywords / 키워드 검색 위주 네이버 카페 크롤링 코드입니다

Language: Python - Size: 30.3 KB - Last synced at: 7 months ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 1

GabrielMazzotta/web-scrapping-real-state

Web scrapping repo for Real State business, including Cloudflare handling.

Language: Jupyter Notebook - Size: 832 KB - Last synced at: 9 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

pppiyo/box_office_analyzer

Simple web crawler in python to analyze box office

Language: Python - Size: 113 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

EesunMoon/Emotion-Recognization Fork of SejongAIQ/Emotion-Recognization

[AIQ] Quantitive Modeling using alternative data(article), building sentiment dictionary

Language: Jupyter Notebook - Size: 4.76 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

pnguyen215/instagram-crawler

Instagram Crawler is a Python script to download posts from a specified Instagram account.

Language: Python - Size: 21.5 KB - Last synced at: 9 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

0MeMo07/Web-Crawler

Web Crawler with Python

Language: Python - Size: 8.79 KB - Last synced at: 7 months ago - Pushed at: about 2 years ago - Stars: 7 - Forks: 0

deepmancer/advanced-recommender-system

Advance information retrieval system that combines advanced indexing, machine learning, and personalized search to enhance academic research and document discovery.

Language: Jupyter Notebook - Size: 1.85 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 0

omkarcloud/dentalkart-scraper

🚀 SCRAPE 1000'S OF PRODUCTS FROM DENTALKART 🤖

Language: Python - Size: 908 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 2

JaShakouri/time.ir-crawling

api getting iran holidays per years or months

Language: Python - Size: 431 KB - Last synced at: 17 days ago - Pushed at: about 3 years ago - Stars: 6 - Forks: 2

morningkim/open_job_search

crawling job list in hibrain. net

Language: Python - Size: 38.1 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Moe131/webcrawler

Python web crawler designed to scrape websites

Language: Python - Size: 3.52 MB - Last synced at: 7 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

MLArtist/WebScraper

Python-based web crawling script with randomized intervals, user-agent rotation, and proxy server IP rotation to outsmart website bots and prevent blocking.

Language: Python - Size: 43.9 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 61 - Forks: 14

YaChuuuuu/Crawler

Language: Jupyter Notebook - Size: 6.45 MB - Last synced at: 9 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

soomin-kevin-sung/crossfit-com-wod

Notify daily crosffit .com wod by opening issue

Language: Python - Size: 29.3 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

realcoding2003/k-building-data-index

건축물 대장 정보를 조회하여 전국 번지 정보를 인덱싱 하는 코드

Language: Python - Size: 1.1 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

teedihuni/crawler

crawling code with selenium

Language: Python - Size: 20.4 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

AlanJumeaucourt/tca-net

Discord bot that send notification 10 minutes before classes as a remind of the classroom and which professor

Language: Python - Size: 98.6 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

daudputra/Sekolah-Data-Kemdikbud

Get detailed school data in all districts and provinces in Indonesia

Language: Python - Size: 230 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

xinhuang0716/Customized_Skyscanner

a customized skyscanner for north-east asia flight tickets

Language: Python - Size: 536 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

hpcao299/McQueen

A Collection Of Information-Gathering Tools 🌐

Language: Python - Size: 72.3 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

serpwings/data-science-for-digital-marketers

Juypter Notebooks for Lecture Series on Data Science for Digital Marketers

Language: Jupyter Notebook - Size: 694 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 8

bayuik/tweepy_tutorial_crawling

Crawling data from twitter using Tweepy

Language: Jupyter Notebook - Size: 40 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

pyladies-brazil/crawler-tutorial

Tutorial de raspagem de dados realizado em parceria com a JusBrasil

Language: HTML - Size: 4.84 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 22 - Forks: 6

alfanrosid/crawling-text-and-image-twitter

Crawling data text dan image Twitter dengan mengggunakan JupyterNotebook dan Library tweepy

Language: Jupyter Notebook - Size: 276 KB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

HarryZhangHH/Bacholar_Thesis

An E-commerce Website Migration Software Tool for Small Businesses

Language: Jupyter Notebook - Size: 34.9 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

CodingLeeSeungHoon/gazuaProject

⚡ crawl announcement data from upbit, trade cryptocurrency through upbit OpenAPI

Language: Python - Size: 27.3 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 4 - Forks: 3

muhammadkhairiisufyaan/Analisis-Sentimen-Berita-Pilpres-pada-Platform-Twitter-

This project represents my team's contribution to the semi-final of Gelar Rasa 2023, a competition organized by HIMASADA UPN "Veteran" East Java. With enthusiasm and dedication, our team managed to secure the 2nd place in the competition.

Language: Jupyter Notebook - Size: 5.18 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

helloMinji/WebCrawling-smartchoice

스마트초이스 국내 품질평가 결과 웹 크롤링 (주 내용 : option value, 테이블 가져오기)

Language: Python - Size: 34.2 KB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 0

helloMinji/WebCrawling-sendEmail-AUTO

자신이 가진 파일과 웹사이트의 내용을 비교해, 변경사항이 있으면 이를 메일로 안내 (주 내용 : Crawling, mail send used Outlook)

Language: Python - Size: 8.79 KB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 0

CrawLing72/OnlinePostTitleAnalyzing

온라인 커뮤니티의 인기글이 가지는 제목의 특징에 대한 간단한 분석

Language: Jupyter Notebook - Size: 774 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

bsq-studio/crawler

Exemple Crawler

Language: Python - Size: 1000 Bytes - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

scrapehero-code/google_maps_scraper

A simple webscraper to extract basic details like Title, phone no., review count , rating and address from Google Maps.

Language: Python - Size: 26.4 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 3

leechannie/Hadoop_analysis

공공데이터와 Google, Naver 검색량 크롤링을 통해 분석한 '서울틀별시 각 키워드별 추천 자치구'

Language: Jupyter Notebook - Size: 1.34 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 1

qwe8496516/fcu_course

逢甲大學搶課系統 (GUI for Feng Chia University Course Rob)

Language: HTML - Size: 199 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

DolXeGoD/instagram-trend-wordcloud-generator

원하는 태그를 입력하면 인스타그램에서 해당 태그에 대한 최근 데이터를 크롤링 한 후 워드클라우드 형태로 생성해주는 Python 스크립트입니다.

Language: Python - Size: 22.5 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

Anzo52/osintbeast

Combining (mostly) Python OSINT tools into a single framework with support for sqlite3 database, currently working on mysql support.

Language: Python - Size: 30.3 KB - Last synced at: 7 months ago - Pushed at: almost 2 years ago - Stars: 5 - Forks: 1

Mr0Wido/commoncrawl.py

This Python script is a multi-threaded tool for retrieving data from the CommonCrawl index. It allows you to specify a domain or a list of domains, and it will retrieve all URLs associated with those domains that are indexed by CommonCrawl.

Language: Python - Size: 3.91 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Mr0Wido/urlcrawler.py

urlcrawler.py is a Python script that performs a web crawl for a spesific domain or domains list. This script finds all URLs under the domains.

Language: Python - Size: 5.86 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

ahmadara/Instagram-Crawler

Instagram Crawler with python and clean data to visual with word cloud

Language: Jupyter Notebook - Size: 1.5 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

surister/scrupy

Python library to create web Crawlers which aims to be powerful yet simple.

Language: Python - Size: 271 KB - Last synced at: 8 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ShhRey/WebCrawler

Explore the evolution of web scraping in Python, from basic data extraction using BeautifulSoup to advanced web crawling and automation with Selenium. Store data in MongoDB and create a versatile web crawler. Learn to automate social media account creation. Includes instructions for adding Chrome WebDriver for Selenium.

Language: Python - Size: 11.7 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0