GitHub topics: data-sketches
ekzhu/datasketch
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
Language: Python - Size: 5.68 MB - Last synced at: 4 days ago - Pushed at: 12 months ago - Stars: 2,694 - Forks: 298

dynatrace-oss/hash4j
Dynatrace hash library for Java
Language: Java - Size: 37.1 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 105 - Forks: 11

oertl/hyperloglog-sketch-estimation-paper
Paper about the estimation of cardinalities from HyperLogLog sketches
Language: TeX - Size: 51.6 MB - Last synced at: about 2 months ago - Pushed at: almost 4 years ago - Stars: 62 - Forks: 6

Btsan/ApproximateSketch
Approximate Sketches for Join Size Estimation (SIGMOD'24)
Language: Python - Size: 18.1 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

ikegami-yukino/madoka-python
Memory-efficient Count-Min Sketch Counter (based on Madoka C++ library)
Language: C++ - Size: 231 KB - Last synced at: 12 days ago - Pushed at: over 6 years ago - Stars: 26 - Forks: 2

Shozye/sketcher
Program to test Performance of Data Sketches such as FastExpSketch, QSketch
Language: C++ - Size: 45.9 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

dynatrace-oss/dynahist
DynaHist: A Dynamic Histogram Library for Java
Language: Java - Size: 1.84 MB - Last synced at: 10 days ago - Pushed at: 8 months ago - Stars: 45 - Forks: 9

andrewmcloud/consimilo
A Clojure library for querying large data-sets on similarity
Language: Clojure - Size: 536 KB - Last synced at: 3 days ago - Pushed at: about 6 years ago - Stars: 63 - Forks: 4

dynatrace-research/exaloglog-paper
ExaLogLog: Space-Efficient and Practical Approximate Distinct Counting up to the Exa-Scale
Language: Java - Size: 2.27 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 1

dynatrace-research/ultraloglog-paper
UltraLogLog: A Practical and More Space-Efficient Alternative to HyperLogLog for Approximate Distinct Counting
Language: Python - Size: 4.23 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

turu/yalal
Yet Another Lame Algorithm Library
Language: Python - Size: 50.8 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

isarn/isarn-sketches-spark
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Language: Scala - Size: 1.33 MB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 29 - Forks: 12

erikerlandson/cdf-splining-prototype
A Prototype For Fitting Monotonic Cubic Splines to a Tdigest Sketch
Language: Jupyter Notebook - Size: 1.2 MB - Last synced at: 3 months ago - Pushed at: over 6 years ago - Stars: 2 - Forks: 0

galprz/dns-random-subdomains-ddos-attack
Implementation for - Mitigating DNS random subdomain DDoS attacks by distinct heavy hitters sketches
Language: Jupyter Notebook - Size: 1.11 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 8 - Forks: 3

justinfargnoli/simhash
A barebones implementation of the simhash data sketching algorithm.
Language: Go - Size: 7.81 KB - Last synced at: 11 months ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0
