GitHub topics: web-robots
jimsmart/progszy
Progszy is a hard-caching HTTP(S) proxy server, for web robots.
Language: Go - Size: 20.8 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0
jonasjacek/robots.txt
Simple robots.txt template. Keep unwanted robots out (disallow). White lists (allow) legitimate user-agents. Useful for all websites.
Size: 135 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 87 - Forks: 38
acuciureanu/spidertrap-rs
A simple trap for web crawlers
Language: Rust - Size: 7.81 KB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 12 - Forks: 0
din0s/ml-for-bot-detection
A Python notebook showcasing the use of Machine Learning for the task of bot detection, with an emphasis on e-commerce sites.
Language: Jupyter Notebook - Size: 1.27 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 3