An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: minhash

hscspring/sto

MinHash and LSH Based Store and Query.

Language: Python - Size: 9.77 KB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

zxmeng/SimilarityDetection

Similarity Detection on Wikipedia Articles using MinHash and Random Projection implemented in Hadoop/Spark

Language: Java - Size: 69.5 MB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 1

tkukurin/Lab.Bioinformatics

University work. Approximate aligner for long DNA sequences. Estimates Jaccard similarity from k-mers via minimizers and MinHash, then uses it as a sequence identity proxy.

Language: Java - Size: 90.3 MB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

joaocps/mpei-bloomfilter

Probabilistic methods for computer engineering - Final Project

Language: Java - Size: 596 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

DearMadMan/minhash

An implementation of the minhash algorithm in golang

Language: Go - Size: 2.93 KB - Last synced at: about 1 month ago - Pushed at: almost 6 years ago - Stars: 2 - Forks: 0

user-cube/NewsAnalyzer

Tool to analyze news from a dataset.

Language: Java - Size: 10.2 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

LuoZijun/rust-jieba

Rust jieba

Language: Rust - Size: 1.97 MB - Last synced at: 5 days ago - Pushed at: over 6 years ago - Stars: 2 - Forks: 0

npredey/GeneNetworks

Language: Python - Size: 60.5 KB - Last synced at: about 2 months ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 0

coderthetyler/mhash-c

An implementation of the MinHashing algorithm in C using POSIX threads.

Language: C - Size: 3.86 MB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

anastasia/minhash

Language: Python - Size: 16.6 KB - Last synced at: about 1 month ago - Pushed at: over 6 years ago - Stars: 2 - Forks: 1

CharuMehndiratta/CSE549

Min Hash and Containment Hash implementation for long reads in C++

Language: C++ - Size: 1.3 MB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

worldofnick/Machine-Learning

Collection of code covering various topics in Machine Learning

Language: Jupyter Notebook - Size: 3.48 MB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 0

zeitunik/Big-Data

Big data homework solutions

Language: Python - Size: 146 KB - Last synced at: about 2 years ago - Pushed at: almost 8 years ago - Stars: 0 - Forks: 0

Related Keywords
minhash 113 lsh 29 locality-sensitive-hashing 23 jaccard-similarity 19 bloom-filter 18 similarity 14 bioinformatics 13 minhash-lsh-algorithm 13 java 11 python 11 hyperloglog 11 simhash 9 lsh-algorithm 6 jaccard-similarity-estimation 6 document-similarity 6 text-diff 5 hashing 5 jaccard 5 data-mining 5 jaccard-distance 5 count-min-sketch 5 sketch 5 cosine-similarity 4 elasticsearch 4 sketching 4 metagenomics 4 minhash-similarity 4 deduplication 4 quartz 4 notifications 4 diffmatchpatch 4 differences-detected 4 similarity-search 4 minhash-sketches 4 fasta 3 machine-learning 3 cosine-distance 3 big-data 3 search 3 minwise-hashing-algorithm 3 minwise-hashing 3 work-in-progress 3 cardinality-estimation 3 plagiarism-detection 3 text-mining 3 shingling 3 clustering 3 spark 3 kmer 3 data-sketches 3 rust 3 sourmash 3 hash-functions 3 matlab 3 hash 3 nanopore 2 golang 2 recommender-system 2 mapreduce 2 map-reduce 2 algorithm 2 plagiarism 2 nlp 2 weighted-sets 2 tf-idf 2 metagenome 2 random-number-generators 2 plasmids 2 probability-distribution 2 jaccard-index 2 logistic-regression 2 probabilistic-programming 2 estimation 2 random-variables 2 alignment 2 jupyter-notebook 2 tomcat 2 rest-api 2 jetty 2 jersey2 2 cybersecurity 2 dropwizard 2 malware 2 contigs 2 privacy 2 distributed 2 duplicates 2 java-library 2 genomics 2 c 2 hamming-distance 2 numpy 2 approximate-nearest-neighbor-search 2 hyperloglog-sketches 2 lsh-ensemble 2 lsh-forest 2 hash-algorithm 2 clojure 2 lsh-implementation 2 statistics 2