An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: ai-scraping

ScrapeGraphAI/Scrapegraph-ai

Python scraper based on AI

Language: Python - Size: 15.4 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 20,057 - Forks: 1,707

devflowinc/firecrawl-simple Fork of mendableai/firecrawl

βž– Stripped down, stable version of firecrawl optimized for self-hosting and ease of contribution. Billing logic and AI features are completely removed. Crawl and convert any website into LLM-ready markdown.

Language: TypeScript - Size: 40 MB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 477 - Forks: 37

L1shed/Turbo

Fastest and cheapest distributed residential proxy network.

Language: Go - Size: 13.5 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 8 - Forks: 1

Chakszzz/NB-Scraper

All Scrapers Resource Available Here! Give Us Stars🌟

Language: TypeScript - Size: 2.65 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2 - Forks: 0

mendableai/firecrawl

πŸ”₯ Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

Language: TypeScript - Size: 57.5 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 40,225 - Forks: 3,756

D4Vinci/Scrapling

πŸ•·οΈ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

Language: Python - Size: 1.9 MB - Last synced at: 4 days ago - Pushed at: 8 days ago - Stars: 5,424 - Forks: 302

mendableai/firecrawl-app-examples

πŸ”₯ This repository contains complete application examples, including websites and other projects, developed using Firecrawl.

Language: Jupyter Notebook - Size: 13.6 MB - Last synced at: 5 days ago - Pushed at: 20 days ago - Stars: 416 - Forks: 111

itsOwen/CyberScraper-2077

A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama

Language: Python - Size: 355 KB - Last synced at: 4 days ago - Pushed at: 7 days ago - Stars: 1,715 - Forks: 154

WeebDataHoarder/go-away

[Mirror] Self-hosted abuse detection and rule enforcement against low-effort mass AI scraping and bots.

Language: Go - Size: 956 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 50 - Forks: 3

kaymen99/google-maps-lead-generator

Extract Google Maps business leads and enrich contact details using AI & web scraping

Language: Python - Size: 43 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 1

OpenData4Sciece/Disneyland-Resorts-Hotels

Disneyland Resorts Hotels Investigation Study

Language: Python - Size: 46.9 KB - Last synced at: 6 days ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

any4ai/AnyCrawl

AnyCrawl πŸš€: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts structured SERP results from Google/Bing/Baidu/etc. Native multi-threading for bulk processing.

Language: TypeScript - Size: 771 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 14 - Forks: 1

vonuyvicoo/crava

AI-powered web scraper using Javascript/Typescript.

Language: TypeScript - Size: 66.4 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

ArchiveBox/abx-dl

⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/subtitles, PDFs, screenshots, article text, git repos, and more...

Language: JavaScript - Size: 177 KB - Last synced at: 5 days ago - Pushed at: 6 months ago - Stars: 75 - Forks: 4

spider-rs/spider-clients

Python, Javascript, and Rust libraries for the Spider Cloud API.

Language: Python - Size: 1.39 MB - Last synced at: 6 days ago - Pushed at: 15 days ago - Stars: 16 - Forks: 6

spider-rs/web-crawling-guides

How to guides on web-crawling or scraping

Size: 7.85 MB - Last synced at: 6 days ago - Pushed at: about 2 months ago - Stars: 20 - Forks: 4

raznem/parsera

Lightweight library for scraping web-sites with LLMs

Language: Python - Size: 2.24 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 1,093 - Forks: 63

kaymen99/ai-web-scraper

AI web scraper built with Crawl4AI for extracting structured leads data from websites.

Language: Python - Size: 19.5 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 14 - Forks: 1

nathabonfim59/md-fetch

A CLI tool and REST API that converts web content to clean Markdown, bypassing anti-scraping measures using headless browsers. Perfect for AI/LLM applications

Language: Go - Size: 1.58 MB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

jenslys/skrape-js

TypeScript/Node.js SDK to easily interact with the skrape.ai API

Language: TypeScript - Size: 71.3 KB - Last synced at: 6 days ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0