An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: web-robots

jimsmart/progszy

Progszy is a hard-caching HTTP(S) proxy server, for web robots.

Language: Go - Size: 20.8 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

jonasjacek/robots.txt

Simple robots.txt template. Keep unwanted robots out (disallow). White lists (allow) legitimate user-agents. Useful for all websites.

Size: 135 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 87 - Forks: 38

acuciureanu/spidertrap-rs

A simple trap for web crawlers

Language: Rust - Size: 7.81 KB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 12 - Forks: 0

din0s/ml-for-bot-detection

A Python notebook showcasing the use of Machine Learning for the task of bot detection, with an emphasis on e-commerce sites.

Language: Jupyter Notebook - Size: 1.27 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 3