An open API service providing repository metadata for many open source software ecosystems.

Topic: "url-normalization"

sindresorhus/normalize-url

Normalize a URL

Language: JavaScript - Size: 124 KB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 847 - Forks: 123

patternhelloworld/url-knife

Extract and decompose (fuzzy) URLs (including emails, which are conceptually a part of URLs) in texts with Area-Pattern-based modularity

Language: TypeScript - Size: 958 KB - Last synced at: 15 days ago - Pushed at: 4 months ago - Stars: 352 - Forks: 15

adbar/courlan

Clean, filter and sample URLs to optimize data collection โ€“ Python & command-line โ€“ Deduplication, spam, content and language filters

Language: Python - Size: 547 KB - Last synced at: 14 days ago - Pushed at: 5 months ago - Stars: 137 - Forks: 9

xojoc/cleanurl

Remove clutter from URLs and return a canonicalized version

Language: Python - Size: 77.1 KB - Last synced at: 9 months ago - Pushed at: 12 months ago - Stars: 17 - Forks: 3

hanover-computing/canonicize-url

Get a stable, canonical version of any URL, with DNS and HTTPS checks, redirects, tracker stripping, and canonical link extraction!

Language: JavaScript - Size: 1.57 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 12 - Forks: 0

vladkens/url-normalize

๐Ÿ”—๐Ÿงน Normalize URLs to a standardized form. HTTPS by default, flexible configuration, custom protocols, domain extraction, humazing URL, and punycode support. Both CJS & ESM modules available.

Language: TypeScript - Size: 159 KB - Last synced at: 13 days ago - Pushed at: 3 months ago - Stars: 9 - Forks: 2

seroperson/urlopt4s

Allows you to remove ad/tracking query params from a given URL in Scala

Language: Scala - Size: 77.1 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 8 - Forks: 0

toimik/UrlNormalization

URL normalizer to canonicalize (standardize) the text representation of a URL to determine if differently-formatted URLs are identical

Language: C# - Size: 57.6 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

chipslays/php-url-fingerprint

๐Ÿ”— Pathor is a PHP library for normalizing, analyzing, and comparing URLs.

Language: PHP - Size: 21.5 KB - Last synced at: 16 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

simonpierreboucher/Crawler

A robust, modular web crawler built in Python for extracting and saving content from websites. This crawler is specifically designed to extract text content from both HTML and PDF files, saving them in a structured format with metadata.

Language: Python - Size: 87.9 KB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Manu-sh/http_normalizer

http url normalization for web crawlers

Language: C++ - Size: 481 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Atanukumardey/NLP-NoteBooks

Natural Language Precessing related notebooks (Machine Learning)

Language: Jupyter Notebook - Size: 301 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0