crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apify%2Fcrawlee-python
PURL: pkg:github/apify/crawlee-python

Stars: 5,947
Forks: 404
Open issues: 73

License: apache-2.0
Language: Python
Size: 29.9 MB
Dependencies parsed at: Pending

Created at: over 1 year ago
Updated at: about 17 hours ago
Pushed at: about 17 hours ago
Last synced at: about 17 hours ago

Commit Stats

Commits: 595
Authors: 25
Mean commits per author: 23.8
Development Distribution Score: 0.689
More commit stats: https://commits.ecosyste.ms/hosts/GitHub/repositories/apify/crawlee-python

Topics: apify, automation, beautifulsoup, crawler, crawling, hacktoberfest, headless, headless-chrome, pip, playwright, python, scraper, scraping, web-crawler, web-crawling, web-scraping

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos

GitHub / apify / crawlee-python

Commit Stats