GitHub topics: ai-scraping
ScrapeGraphAI/Scrapegraph-ai
Python scraper based on AI
Language: Python - Size: 15.4 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 20,057 - Forks: 1,707

devflowinc/firecrawl-simple Fork of mendableai/firecrawl
β Stripped down, stable version of firecrawl optimized for self-hosting and ease of contribution. Billing logic and AI features are completely removed. Crawl and convert any website into LLM-ready markdown.
Language: TypeScript - Size: 40 MB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 477 - Forks: 37

L1shed/Turbo
Fastest and cheapest distributed residential proxy network.
Language: Go - Size: 13.5 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 8 - Forks: 1

Chakszzz/NB-Scraper
All Scrapers Resource Available Here! Give Us Starsπ
Language: TypeScript - Size: 2.65 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2 - Forks: 0

mendableai/firecrawl
π₯ Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Language: TypeScript - Size: 57.5 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 40,225 - Forks: 3,756

D4Vinci/Scrapling
π·οΈ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!
Language: Python - Size: 1.9 MB - Last synced at: 4 days ago - Pushed at: 8 days ago - Stars: 5,424 - Forks: 302

mendableai/firecrawl-app-examples
π₯ This repository contains complete application examples, including websites and other projects, developed using Firecrawl.
Language: Jupyter Notebook - Size: 13.6 MB - Last synced at: 5 days ago - Pushed at: 20 days ago - Stars: 416 - Forks: 111

itsOwen/CyberScraper-2077
A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama
Language: Python - Size: 355 KB - Last synced at: 4 days ago - Pushed at: 7 days ago - Stars: 1,715 - Forks: 154

WeebDataHoarder/go-away
[Mirror] Self-hosted abuse detection and rule enforcement against low-effort mass AI scraping and bots.
Language: Go - Size: 956 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 50 - Forks: 3

kaymen99/google-maps-lead-generator
Extract Google Maps business leads and enrich contact details using AI & web scraping
Language: Python - Size: 43 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 1

OpenData4Sciece/Disneyland-Resorts-Hotels
Disneyland Resorts Hotels Investigation Study
Language: Python - Size: 46.9 KB - Last synced at: 6 days ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

any4ai/AnyCrawl
AnyCrawl π: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts structured SERP results from Google/Bing/Baidu/etc. Native multi-threading for bulk processing.
Language: TypeScript - Size: 771 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 14 - Forks: 1

vonuyvicoo/crava
AI-powered web scraper using Javascript/Typescript.
Language: TypeScript - Size: 66.4 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

ArchiveBox/abx-dl
β¬οΈ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). π Uses headless Chrome to get HTML, JS, CSS, images/video/audio/subtitles, PDFs, screenshots, article text, git repos, and more...
Language: JavaScript - Size: 177 KB - Last synced at: 5 days ago - Pushed at: 6 months ago - Stars: 75 - Forks: 4

spider-rs/spider-clients
Python, Javascript, and Rust libraries for the Spider Cloud API.
Language: Python - Size: 1.39 MB - Last synced at: 6 days ago - Pushed at: 15 days ago - Stars: 16 - Forks: 6

spider-rs/web-crawling-guides
How to guides on web-crawling or scraping
Size: 7.85 MB - Last synced at: 6 days ago - Pushed at: about 2 months ago - Stars: 20 - Forks: 4

raznem/parsera
Lightweight library for scraping web-sites with LLMs
Language: Python - Size: 2.24 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 1,093 - Forks: 63

kaymen99/ai-web-scraper
AI web scraper built with Crawl4AI for extracting structured leads data from websites.
Language: Python - Size: 19.5 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 14 - Forks: 1

nathabonfim59/md-fetch
A CLI tool and REST API that converts web content to clean Markdown, bypassing anti-scraping measures using headless browsers. Perfect for AI/LLM applications
Language: Go - Size: 1.58 MB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

jenslys/skrape-js
TypeScript/Node.js SDK to easily interact with the skrape.ai API
Language: TypeScript - Size: 71.3 KB - Last synced at: 6 days ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0
