An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: minhash

sourmash-bio/sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.

Language: Python - Size: 46.9 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 502 - Forks: 83

dynatrace-oss/hash4j

Dynatrace hash library for Java

Language: Java - Size: 37 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 103 - Forks: 11

codelibs/minhash

This provides tools for b-bit MinHash algorism.

Language: Java - Size: 46.9 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 35 - Forks: 10

Callidon/bloom-filters

JS implementation of probabilistic data structures: Bloom Filter (and its derived), HyperLogLog, Count-Min Sketch, Top-K and MinHash

Language: TypeScript - Size: 9.16 MB - Last synced at: 6 days ago - Pushed at: 26 days ago - Stars: 398 - Forks: 47

ekzhu/datasketch

MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW

Language: Python - Size: 5.68 MB - Last synced at: 14 days ago - Pushed at: 11 months ago - Stars: 2,667 - Forks: 296

shaltielshmid/MinHashSharp

A Robust Library in C# for Similarity Estimation

Language: C# - Size: 39.1 KB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 1

beowolx/rensa

High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datasets

Language: Python - Size: 112 KB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 80 - Forks: 9

RagnarGrootKoerkamp/simd-sketch

Compute bottom-s sketches and s-buckets sketches, using simd-minimizers crate.

Language: Rust - Size: 60.5 KB - Last synced at: 9 days ago - Pushed at: about 1 month ago - Stars: 16 - Forks: 1

serega/gaoya

Locality Sensitive Hashing

Language: Rust - Size: 236 KB - Last synced at: 15 days ago - Pushed at: almost 2 years ago - Stars: 72 - Forks: 7

dnbaker/sketch

C++ Implementations of sketch data structures with SIMD Parallelism, including Python bindings

Language: C++ - Size: 4.43 MB - Last synced at: 16 days ago - Pushed at: 9 months ago - Stars: 152 - Forks: 13

h4sh5/bcddb

cross-architecture binary comparison database

Language: Python - Size: 255 KB - Last synced at: 11 days ago - Pushed at: 6 months ago - Stars: 8 - Forks: 2

justinbt1/Akin

Python library for detecting near duplicate texts in a corpus at scale.

Language: Python - Size: 2.77 MB - Last synced at: 6 days ago - Pushed at: about 2 months ago - Stars: 8 - Forks: 0

src-d/minhashcuda

Weighted MinHash implementation on CUDA (multi-gpu).

Language: C++ - Size: 89.8 KB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 117 - Forks: 24

bigmlcom/sketchy

Sketching Algorithms for Clojure (bloom filter, min-hash, hyper-loglog, count-min sketch)

Language: Clojure - Size: 147 KB - Last synced at: 9 days ago - Pushed at: almost 2 years ago - Stars: 148 - Forks: 18

sourmash-bio/wort

A database for signatures of public genomic sources

Language: Python - Size: 526 KB - Last synced at: 4 days ago - Pushed at: about 2 months ago - Stars: 18 - Forks: 2

dselivanov/LSHR

Locality Sensitive Hashing In R

Language: R - Size: 98.6 KB - Last synced at: 9 days ago - Pushed at: over 6 years ago - Stars: 40 - Forks: 13

LiveRamp/HyperMinHash-java

Union, intersection, and set cardinality in loglog space

Language: Java - Size: 572 KB - Last synced at: 27 days ago - Pushed at: almost 2 years ago - Stars: 56 - Forks: 10

guenthermi/fast_minh

Python package for fast MinHash calculation and operations

Language: C++ - Size: 19.5 KB - Last synced at: 27 days ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

markusorsi/mapchiral

Chiral version of the MinHashed Atom-Pair Fingerprint

Language: Python - Size: 323 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 19 - Forks: 5

lgautier/mashing-pumpkins

Minhash and maxhash library in Python, combining flexibility, expressivity, and performance.

Language: C - Size: 1.4 MB - Last synced at: 4 days ago - Pushed at: 4 months ago - Stars: 21 - Forks: 3

duhaime/minhash

Quickly estimate the similarity between many sets

Language: JavaScript - Size: 1010 KB - Last synced at: 11 days ago - Pushed at: over 2 years ago - Stars: 51 - Forks: 11

edawson/rkmh

Classify sequencing reads using MinHash.

Language: C++ - Size: 33.3 MB - Last synced at: 20 days ago - Pushed at: about 5 years ago - Stars: 48 - Forks: 4

shreyansh26/MinHash-Implemenation

A simple MinHash implementation based on the explanation in the Mining of Massive Datasets course by Stanford

Language: Python - Size: 7.4 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

oertl/treeminhash

TreeMinHash: Fast Sketching for Weighted Jaccard Similarity Estimation

Language: C++ - Size: 2.62 MB - Last synced at: 10 days ago - Pushed at: about 2 years ago - Stars: 14 - Forks: 3

oertl/probminhash

ProbMinHash – A Class of Locality-Sensitive Hash Algorithms for the (Probability) Jaccard Similarity

Language: C++ - Size: 6.26 MB - Last synced at: 10 days ago - Pushed at: over 4 years ago - Stars: 42 - Forks: 6

YaleDHLab/intertext 📦

Detect and visualize text reuse

Language: Python - Size: 3.11 MB - Last synced at: 5 months ago - Pushed at: 8 months ago - Stars: 115 - Forks: 10

davidsvy/Neural-Scam-Artist

Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.

Language: Python - Size: 188 KB - Last synced at: 5 months ago - Pushed at: over 3 years ago - Stars: 24 - Forks: 1

imzoc/fast-similarity-methods-rust 📦

Rust implementation of alignment-free similarity estimation methods.

Language: Rust - Size: 3.57 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

Forthoney/doc_sim

Approximate document similarity with Minhash + Locality Sensitive Hashing

Language: Ruby - Size: 48.8 KB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 2

esteinig/sketchy

Genomic neighbor typing of bacterial pathogens using MinHash :rat:

Language: Rust - Size: 20 MB - Last synced at: 5 days ago - Pushed at: over 2 years ago - Stars: 44 - Forks: 3

BlaCkinkGJ/catch-me-if-you-can

plagiarism detector

Language: Python - Size: 62.5 KB - Last synced at: 5 months ago - Pushed at: about 4 years ago - Stars: 23 - Forks: 3

sskender/analysis-of-massive-datasets

Analysis of Massive Datasets FER labs

Language: Python - Size: 19 MB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

codelibs/elasticsearch-minhash

Elasticsearch plugin for b-bit minhash algorism

Language: Java - Size: 250 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 60 - Forks: 14

andrewmcloud/consimilo

A Clojure library for querying large data-sets on similarity

Language: Clojure - Size: 536 KB - Last synced at: 9 days ago - Pushed at: about 6 years ago - Stars: 63 - Forks: 4

paul-sud/bigbed-jaccard

A tool to approximate the Jaccard similarity of bigBed files from functional genomic datasets

Language: Jupyter Notebook - Size: 777 KB - Last synced at: 11 months ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

ppw0/minhash

find similar text files quickly

Language: Python - Size: 53.7 KB - Last synced at: 5 months ago - Pushed at: almost 4 years ago - Stars: 6 - Forks: 1

edawson/mkmh

Generate kmers/minimizers/hashes/MinHash signatures, including with multiple kmer sizes.

Language: C++ - Size: 204 KB - Last synced at: 20 days ago - Pushed at: over 4 years ago - Stars: 24 - Forks: 2

vokter/vokter-scheduler

(WIP)

Size: 0 Bytes - Last synced at: 12 months ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 0

vokter/vokter-client-java

Sample Jetty/Jersey2 server that interoperates with a running Vokter server (https://github.com/vokter/vokter).

Language: Java - Size: 7.81 KB - Last synced at: 12 months ago - Pushed at: almost 9 years ago - Stars: 0 - Forks: 0

vokter/vokter-server

(WIP) HTTP server that deploy distributes Vokter (https://github.com/vokter/vokter) through a REST API.

Size: 3.91 KB - Last synced at: 12 months ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 0

hugofpaiva/mpei-p1 📦

Trabalho Prático da UC de Métodos Probabilísticos para Engenharia Informática, UA 2019/2020

Language: Java - Size: 39.9 MB - Last synced at: 12 months ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 1

sdevalapurkar/similar-questions

👯 Algorithms using Jaccard similarity to identify questions from a list that are similar to one another

Language: Python - Size: 13.6 MB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

dynatrace-research/set-sketch-paper

SetSketch: Filling the Gap between MinHash and HyperLogLog

Language: C++ - Size: 23.7 MB - Last synced at: 12 months ago - Pushed at: over 3 years ago - Stars: 46 - Forks: 5

cristianovagos/bloomfilter

Bloom Filter and MinHash techniques built in MatLab

Language: Matlab - Size: 355 KB - Last synced at: about 1 year ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 3

macmanes-lab/MCBS913_2019

This is the repo for the Spring 2019 version of MCBS913

Language: Python - Size: 1.89 MB - Last synced at: about 1 year ago - Pushed at: almost 6 years ago - Stars: 1 - Forks: 1

ekzhu/minhash-lsh

Minhash LSH in Golang

Language: Go - Size: 22.5 KB - Last synced at: 10 days ago - Pushed at: over 5 years ago - Stars: 25 - Forks: 15

pNre/Sketching

Collection of sketching algorithms in Swift

Language: Swift - Size: 52.7 KB - Last synced at: 4 days ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 2

oertl/bagminhash

BagMinHash - Minwise Hashing Algorithm for Weighted Sets

Language: C++ - Size: 1.02 MB - Last synced at: 10 days ago - Pushed at: over 4 years ago - Stars: 26 - Forks: 6

XAH30/LSH-vs-Finesse

In this repository you can find an implementation of LSH (Local | Sensitive Hashing) and Finesse algorithms, designed to find similar data based on their hashes

Language: C++ - Size: 5.72 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

mbrg/py-hyperminhash

HyperLogLog with intersection

Language: Python - Size: 70.3 KB - Last synced at: 12 months ago - Pushed at: about 4 years ago - Stars: 4 - Forks: 0

fturati/floc-minhash-attacks

Implementation for the attacks of the paper "Locality-Sensitive Hashing Does Not Guarantee Privacy! Attacks on Google's FLoC and the MinHash Hierarchy System"

Language: Python - Size: 17 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

PauloMaced0/restaurant-recommender 📦

Development of an interactive system for restaurant recommendation, utilizing filtering algorithms like MinHash and Bloom Filter for analysis and personalized suggestions based on user evaluations.

Language: MATLAB - Size: 2.87 MB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

AmbarChatterjee/ADM_HW4_Group3

This repository contains code and analysis for a homework assignment on recommendation systems and clustering algorithms in Python. Implements techniques like minhash, LSH, feature engineering, dimensionality reduction, K-means and DBSCAN clustering.

Language: Jupyter Notebook - Size: 48.1 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

davidefiocco/dockerized-elasticsearch-duplicate-finder

Attempt to use MinHash to find duplicates in an Elasticsearch index

Language: Python - Size: 11.7 KB - Last synced at: 18 days ago - Pushed at: 12 months ago - Stars: 2 - Forks: 0

nekcht/minhash-lsh-evaluation

Assessing MinHash LSH for text similarity. Compares with kNN using BART embeddings as ground truth. Involves data preprocessing, shingle creation, LSH experiments. Findings inform LSH's efficiency in document similarity tasks, enhancing understanding of LSH techniques.

Language: Jupyter Notebook - Size: 367 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

EdDuarte/similarity-search-java

Easy-to-use Java similarity algorithms for text and numeric-series

Language: Java - Size: 149 KB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 18 - Forks: 10

will-rowe/groot

A resistome profiler for Graphing Resistance Out Of meTagenomes

Language: Go - Size: 12.5 MB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 61 - Forks: 6

mattilyra/LSH

Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents

Language: Python - Size: 513 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 259 - Forks: 72

nepiskopos/duplicate-questions-detection-lsh

Knowledge extraction through Data Analysis, including Locality Sensitive Hashing (LSH).

Language: Jupyter Notebook - Size: 423 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

gurushida/mnemophonix

A simple audio fingerprinting system

Language: C - Size: 316 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 25 - Forks: 4

AIn0n/FMHD

Fast MinHash Distances algorithms collection

Language: C++ - Size: 288 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 1

HuangQiang/k-FreqItems

Massive Sparse Data Clustering Based on Frequent Items (SIGMOD 2023)

Language: Cuda - Size: 25.5 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 1

micts/jss

Fast Jaccard similarity search for abstract sets (documents, products, users, etc.) using MinHashing and Locality Sensitve Hashing

Language: Python - Size: 23.4 KB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 3 - Forks: 0

rigvedpatki/data-mining-assignment-1

Finding Similar Items: Textually Similar Documents

Language: TypeScript - Size: 267 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

92amartins/minhash-example

MinHash Example

Language: Jupyter Notebook - Size: 2.93 KB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 0

CaroseKYS/minhash-test-java

对于minhash的测试程序

Language: Java - Size: 5.86 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

holopoj/FHCP

Implementation of the paper "Finding Highly Correlated Pairs with Powerful Pruning" in Java.

Language: Java - Size: 1.56 MB - Last synced at: over 1 year ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 1

mariofv/DocSim

Minhash text analyzer developed during Algorithmics subject.

Language: C++ - Size: 43.1 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 1

lstasiak/Big-Data-Algorithms-exercises

Set of tasks solved in Big Data Algorithms course

Language: Scala - Size: 3.06 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

Cheng-Lin-Li/Spark

There are Python 2.7 codes and learning notes for Spark 2.1.1

Language: Python - Size: 2.62 MB - Last synced at: 14 days ago - Pushed at: over 6 years ago - Stars: 24 - Forks: 6

wherefortravel/minhash-node-rs

MinHash and LSH index written in Rust for Node.js

Language: Rust - Size: 207 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 1

privacy-lsh/floc-minhash

Implementation for the attacks of the paper "Locality-Sensitive Hashing Does Not Guarantee Privacy! Attacks on Google's FLoC and the MinHash Hierarchy System".

Language: Python - Size: 16.9 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

kmuinfosec/MUSEUM

Scalable and Multifaceted Search and Its Application for Malware Binary Files

Language: Python - Size: 461 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

Mamiglia/ADM-LT_HW1

Homework 1 for the course Advanced Data Mining and Language Technology

Language: Jupyter Notebook - Size: 2.8 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

luizirber/2017-recomb

Poster presented at RECOMB 2017

Size: 2.66 MB - Last synced at: 4 days ago - Pushed at: almost 7 years ago - Stars: 7 - Forks: 3

ryputtam/Locality-Sensitive-Hashing-Plagiarism-Detection

Implementation of Locality Sensitive Hashing to detect plagiarism

Language: Jupyter Notebook - Size: 170 KB - Last synced at: 10 months ago - Pushed at: about 4 years ago - Stars: 2 - Forks: 1

steven-s/text-shingles

k-shingling for text to help compare similarity

Language: Python - Size: 11.7 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 15 - Forks: 11

luizirber/sourmash-rust 📦

Rust implementation of sourmash core functionality

Language: Rust - Size: 4.08 MB - Last synced at: 4 days ago - Pushed at: over 6 years ago - Stars: 9 - Forks: 0

vascoalramos/mpei 📦

Probability Methods for Informatics Engineering | UA 2018/2019

Language: Java - Size: 39.8 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

eduardosantoshf/search-repeated-news 📦

MPEI Project - Search repeated news (using Bloom filter) and similar news (using MinHash) from a news API.

Language: Java - Size: 32.3 MB - Last synced at: about 9 hours ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 1

zahraDehghanian97/MinHashing_Spark

In this repo. , I implement Cosine similarity and MinHashing function ( with and / or operator on band ) to find similarity to specific road in real Traffic dataset using PySpark.

Language: Jupyter Notebook - Size: 65.4 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

santurini/MinHash-LSH-From-Scratch

Implementing a simplified copy of Shazam application from scratch using MinHashing and LSH.

Language: Python - Size: 210 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 1

KarimLulu/locality-sensitive-hashing-knn

Approximate k-Nearest Neighbours in high-dimensional space via Locality Sensitive Hashing

Language: Jupyter Notebook - Size: 265 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

gibranfp/Sampled-MinHashing

A method to mine beyond-pairwise relationships using Min-Hashing for large-scale pattern discovery

Language: C - Size: 522 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 25 - Forks: 8

oma219/pacsketch

Network Anomaly Detection Using Probabilistic Data Structures

Language: C++ - Size: 12.8 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0

travisbrady/flajolet

Probabilistic data structures for OCaml

Language: OCaml - Size: 220 KB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 38 - Forks: 3

TSunny007/Document-Similarity

Using Jaccard-Similarity and Minhashing to determine similarity between two text documents

Language: Jupyter Notebook - Size: 26.4 KB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 6 - Forks: 3

W3ndige/karton-similarity

Aurora karton for similiarity matching.

Language: Python - Size: 20.5 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 2 - Forks: 0

haradama/pHash

Software to identify known plasmid sequence from metagenomic assembly using Minhash

Language: Go - Size: 15.9 MB - Last synced at: about 2 months ago - Pushed at: about 6 years ago - Stars: 3 - Forks: 0

steven-s/minhash-document-clusters

Minhash clustering of text documents

Language: Scala - Size: 33.2 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 4 - Forks: 1

vokter/vokter

Document store that periodically checks for changes in web documents

Language: Java - Size: 120 MB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 2

davidsbatista/MuSICo

A Minwise Hashing Method for Addressing Relationship Extraction from Text

Language: Java - Size: 37.4 MB - Last synced at: about 1 year ago - Pushed at: about 8 years ago - Stars: 5 - Forks: 2

fengxu1996/similarity_find

计算多个文本间相似度

Language: C++ - Size: 43.9 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

W3ndige/karton-minhash

Aurora karton for calculating minhash from input dataset.

Language: Python - Size: 16.6 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

pedrovt/mpei

Probability Methods for Informatics Engineering (University of Aveiro)

Language: Matlab - Size: 4.96 MB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 0

FilipeLopesPires/SpellChecker

SpellChecker: an application to check for spell errors.

Language: Java - Size: 3.54 MB - Last synced at: 3 months ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 1

ahmetcandiroglu/reddit-analyzed

Finding similar subreddits using MinHash and SimRank algorithms

Language: Python - Size: 1.02 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

haradama/PlasmidPicker

Software to identify plasmid sequence data from metagenome using logistic regression and Minhash

Language: Python - Size: 133 MB - Last synced at: 10 days ago - Pushed at: over 6 years ago - Stars: 6 - Forks: 2

bufistov/plagiat-detector

Language: Python - Size: 10.7 KB - Last synced at: 5 months ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

hscspring/sto

MinHash and LSH Based Store and Query.

Language: Python - Size: 9.77 KB - Last synced at: 2 months ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

Related Keywords
minhash 112 lsh 29 locality-sensitive-hashing 23 jaccard-similarity 19 bloom-filter 18 similarity 14 bioinformatics 13 minhash-lsh-algorithm 13 python 11 java 11 hyperloglog 11 simhash 8 document-similarity 6 jaccard-similarity-estimation 6 lsh-algorithm 6 sketch 5 data-mining 5 hashing 5 jaccard 5 text-diff 5 count-min-sketch 5 jaccard-distance 5 differences-detected 4 minhash-similarity 4 elasticsearch 4 deduplication 4 cosine-similarity 4 sketching 4 metagenomics 4 quartz 4 notifications 4 diffmatchpatch 4 similarity-search 4 minhash-sketches 4 rust 3 text-mining 3 cosine-distance 3 work-in-progress 3 machine-learning 3 hash 3 minwise-hashing 3 kmer 3 search 3 matlab 3 plagiarism-detection 3 cardinality-estimation 3 sourmash 3 spark 3 big-data 3 clustering 3 hash-functions 3 minwise-hashing-algorithm 3 shingling 3 data-sketches 3 fasta 3 genomics 2 hamming-distance 2 recommender-system 2 mapreduce 2 map-reduce 2 plagiarism 2 nlp 2 weighted-sets 2 plasmids 2 jaccard-index 2 logistic-regression 2 malware 2 cybersecurity 2 jupyter-notebook 2 random-variables 2 contigs 2 random-number-generators 2 probability-distribution 2 probabilistic-programming 2 tf-idf 2 algorithm 2 numpy 2 c 2 java-library 2 duplicates 2 privacy 2 golang 2 alignment 2 estimation 2 metagenome 2 tomcat 2 rest-api 2 jetty 2 jersey2 2 dropwizard 2 distributed 2 lsh-ensemble 2 lsh-forest 2 hyperloglog-sketches 2 lsh-implementation 2 statistics 2 approximate-nearest-neighbor-search 2 sketch-data-structures 2 clojure 2 cuckoo-filter 2