An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: minwise-hashing

oertl/treeminhash

TreeMinHash: Fast Sketching for Weighted Jaccard Similarity Estimation

Language: C++ - Size: 2.62 MB - Last synced at: 10 days ago - Pushed at: about 2 years ago - Stars: 14 - Forks: 3

dynatrace-research/set-sketch-paper

SetSketch: Filling the Gap between MinHash and HyperLogLog

Language: C++ - Size: 23.7 MB - Last synced at: 12 months ago - Pushed at: over 3 years ago - Stars: 46 - Forks: 5

oertl/bagminhash

BagMinHash - Minwise Hashing Algorithm for Weighted Sets

Language: C++ - Size: 1.02 MB - Last synced at: 10 days ago - Pushed at: over 4 years ago - Stars: 26 - Forks: 6

santurini/Search-Engine-Evaluation-and-Near-Duplicate-Detection

Exploiting the PyTerrier library to perform Search Engine Evaluation and Near Duplicate Detection on different datasets.

Language: Jupyter Notebook - Size: 267 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0