An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: data-sketches

ekzhu/datasketch

MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW

Language: Python - Size: 5.68 MB - Last synced at: 4 days ago - Pushed at: 12 months ago - Stars: 2,694 - Forks: 298

dynatrace-oss/hash4j

Dynatrace hash library for Java

Language: Java - Size: 37.1 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 105 - Forks: 11

oertl/hyperloglog-sketch-estimation-paper

Paper about the estimation of cardinalities from HyperLogLog sketches

Language: TeX - Size: 51.6 MB - Last synced at: about 2 months ago - Pushed at: almost 4 years ago - Stars: 62 - Forks: 6

Btsan/ApproximateSketch

Approximate Sketches for Join Size Estimation (SIGMOD'24)

Language: Python - Size: 18.1 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

ikegami-yukino/madoka-python

Memory-efficient Count-Min Sketch Counter (based on Madoka C++ library)

Language: C++ - Size: 231 KB - Last synced at: 12 days ago - Pushed at: over 6 years ago - Stars: 26 - Forks: 2

Shozye/sketcher

Program to test Performance of Data Sketches such as FastExpSketch, QSketch

Language: C++ - Size: 45.9 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

dynatrace-oss/dynahist

DynaHist: A Dynamic Histogram Library for Java

Language: Java - Size: 1.84 MB - Last synced at: 10 days ago - Pushed at: 8 months ago - Stars: 45 - Forks: 9

andrewmcloud/consimilo

A Clojure library for querying large data-sets on similarity

Language: Clojure - Size: 536 KB - Last synced at: 3 days ago - Pushed at: about 6 years ago - Stars: 63 - Forks: 4

dynatrace-research/exaloglog-paper

ExaLogLog: Space-Efficient and Practical Approximate Distinct Counting up to the Exa-Scale

Language: Java - Size: 2.27 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 1

dynatrace-research/ultraloglog-paper

UltraLogLog: A Practical and More Space-Efficient Alternative to HyperLogLog for Approximate Distinct Counting

Language: Python - Size: 4.23 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

turu/yalal

Yet Another Lame Algorithm Library

Language: Python - Size: 50.8 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

isarn/isarn-sketches-spark

Routines and data structures for using isarn-sketches idiomatically in Apache Spark

Language: Scala - Size: 1.33 MB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 29 - Forks: 12

erikerlandson/cdf-splining-prototype

A Prototype For Fitting Monotonic Cubic Splines to a Tdigest Sketch

Language: Jupyter Notebook - Size: 1.2 MB - Last synced at: 3 months ago - Pushed at: over 6 years ago - Stars: 2 - Forks: 0

galprz/dns-random-subdomains-ddos-attack

Implementation for - Mitigating DNS random subdomain DDoS attacks by distinct heavy hitters sketches

Language: Jupyter Notebook - Size: 1.11 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 8 - Forks: 3

justinfargnoli/simhash

A barebones implementation of the simhash data sketching algorithm.

Language: Go - Size: 7.81 KB - Last synced at: 11 months ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

Related Keywords
data-sketches 15 hyperloglog 6 data-sketching 5 cardinality-estimation 4 data-structures 3 minhash 3 probabilistic-data-structures 2 scala 2 simhash 2 t-digest 2 java 2 count-distinct 2 python 2 jaccard-similarity 2 lsh-forest 2 lsh 2 apache-spark 1 aggregator 1 stream-processing 1 sketch-data-structures 1 machine-learning-algorithms 1 cuckoo-filter 1 bloom-filter 1 document-similarity 1 algorithm-library 1 hamming-distance 1 ultraloglog 1 cardinality 1 sketch 1 hll-algorithm 1 similarity-search 1 similarity 1 minhash-lsh-algorithm 1 recommender-system 1 plagiarism-detection 1 golang 1 go 1 mirai-bot 1 mirai 1 heavy-hitters 1 dns 1 ddos-attacks 1 splines 1 spline-interpolation 1 monotonic-splines 1 density-functions 1 cumulative-distribution-function 1 variable-importance 1 udaf 1 spark-ml 1 spark 1 sketching-algorithm 1 pyspark 1 feature-importance 1 datasets 1 dataset 1 dataframes 1 dataframe 1 hyperloglog-sketches 1 xxh3 1 wyhash 1 superminhash 1 streaming-algorithms 1 non-cryptographic-hash-functions 1 murmur3 1 jumphash 1 imohash 1 hashing-algorithm 1 hash-functions 1 hash-algorithm 1 hash 1 farmhash 1 consistent-hashing 1 weighted-quantiles 1 top-k 1 search 1 lsh-ensemble 1 locality-sensitive-hashing 1 hnsw 1 data-summary 1 cosine-distance 1 collaborative-filtering 1 clojure 1 sketches 1 quantiles 1 quantile-estimation 1 quantile 1 order-statistics 1 memory-efficiency 1 histogram-library 1 histogram 1 hdrhistogram 1 dynamic-allocation 1 ddsketch 1 compression-algorithm 1 approximation-algorithms 1 min-hash 1 python-wrapper 1 memory-efficient 1 counter 1