GitHub / apify / crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apify%2Fcrawlee-python
PURL: pkg:github/apify/crawlee-python
Stars: 5,749
Forks: 393
Open issues: 95
License: apache-2.0
Language: Python
Size: 28.8 MB
Dependencies parsed at: Pending
Created at: over 1 year ago
Updated at: 3 days ago
Pushed at: 3 days ago
Last synced at: 3 days ago
Commit Stats
Commits: 595
Authors: 25
Mean commits per author: 23.8
Development Distribution Score: 0.689
More commit stats: https://commits.ecosyste.ms/hosts/GitHub/repositories/apify/crawlee-python
Topics: apify, automation, beautifulsoup, crawler, crawling, hacktoberfest, headless, headless-chrome, pip, playwright, python, scraper, scraping, web-crawler, web-crawling, web-scraping