An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: minwise-hashing

oertl/treeminhash

TreeMinHash: Fast Sketching for Weighted Jaccard Similarity Estimation

Language: C++ - Size: 2.62 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 14 - Forks: 3

dynatrace-research/set-sketch-paper

SetSketch: Filling the Gap between MinHash and HyperLogLog

Language: C++ - Size: 23.7 MB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 46 - Forks: 5

oertl/bagminhash

BagMinHash - Minwise Hashing Algorithm for Weighted Sets

Language: C++ - Size: 1.02 MB - Last synced at: 3 months ago - Pushed at: almost 5 years ago - Stars: 26 - Forks: 6

santurini/Search-Engine-Evaluation-and-Near-Duplicate-Detection

Exploiting the PyTerrier library to perform Search Engine Evaluation and Near Duplicate Detection on different datasets.

Language: Jupyter Notebook - Size: 267 KB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0