GitHub topics: minhash-lsh-algorithm
andrewmcloud/consimilo
A Clojure library for querying large data-sets on similarity
Language: Clojure - Size: 536 KB - Last synced at: 3 days ago - Pushed at: about 6 years ago - Stars: 63 - Forks: 4

adriacabeza/Document-similarity-detection-using-hashing
:page_with_curl:Document similarity detection using hashing
Language: TeX - Size: 16 MB - Last synced at: 3 months ago - Pushed at: over 6 years ago - Stars: 3 - Forks: 1

dynatrace-research/set-sketch-paper
SetSketch: Filling the Gap between MinHash and HyperLogLog
Language: C++ - Size: 23.7 MB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 46 - Forks: 5

amitkp57/dbms-correlated-columns-detection
Detecting correlated columns in DBMS systems using techniques like Pearson Correlation, LSH Minhashing and Random Sampling.
Language: Jupyter Notebook - Size: 594 KB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

stanford-futuredata/FAST
End-to-end earthquake detection pipeline via efficient time series similarity search
Language: Jupyter Notebook - Size: 158 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 142 - Forks: 56

emarkou/Text-Similarity
A text similarity computation using minhashing and Jaccard distance on reuters dataset
Language: R - Size: 69.3 KB - Last synced at: about 1 year ago - Pushed at: almost 7 years ago - Stars: 16 - Forks: 5

MaviVestini/ADM_HW4
4th homework for ADM
Language: Jupyter Notebook - Size: 1.27 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

AdrianaMacc/Covid-19-BigData-Project
SARS-COV-2 genome analysis using Big Data algorithms in order to find clusters of similar mutations that belongs to different clades which mutate together and generate the correspondent clade.
Language: Jupyter Notebook - Size: 513 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

MaviVestini/ADM-LT_HW1
First homework for the Advance Data Mining course
Language: HTML - Size: 5.91 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

christinebuckler/provider-prescriber
Language: Jupyter Notebook - Size: 30.5 MB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 1

shubhamwaghe/Scalable-Data-Mining
Scalable Data Mining - Assignment submissions
Language: Python - Size: 3.38 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 2 - Forks: 0

xadityax/Locality-Sensitive-Hashing-DNA-Seqs
Implementing Locality Sensitive Hashing for DNA Sequences.
Language: Python - Size: 1.77 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

aloobun/minhash_exp
Deduplication : minhash w/ LSH
Language: Python - Size: 3.91 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

gurushida/mnemophonix
A simple audio fingerprinting system
Language: C - Size: 316 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 25 - Forks: 4

rihenperry/csuci-mscs-thesis-dist-web-crawler
documents my master's level thesis work on building continous, topical web crawler based on mercator 1999
Language: TeX - Size: 27.4 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

micts/jss
Fast Jaccard similarity search for abstract sets (documents, products, users, etc.) using MinHashing and Locality Sensitve Hashing
Language: Python - Size: 23.4 KB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 3 - Forks: 0

SpydazWebAI-NLP/SpydazWebAI_NLP_Models
Word/Image/Audio Embedding models, Tokenizer models, Ngram language models, MatrixModels, Corpus building, Vocabulary Building, Language modelling
Language: Visual Basic .NET - Size: 2.93 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

92amartins/minhash-example
MinHash Example
Language: Jupyter Notebook - Size: 2.93 KB - Last synced at: almost 2 years ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 0

mariofv/DocSim
Minhash text analyzer developed during Algorithmics subject.
Language: C++ - Size: 43.1 MB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 1

Cheng-Lin-Li/Spark
There are Python 2.7 codes and learning notes for Spark 2.1.1
Language: Python - Size: 2.62 MB - Last synced at: about 1 month ago - Pushed at: over 6 years ago - Stars: 24 - Forks: 6

wherefortravel/minhash-node-rs
MinHash and LSH index written in Rust for Node.js
Language: Rust - Size: 207 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 12 - Forks: 1

LM1997610/Data-Mining
Homeworks for Advanced Data Mining and Language Technology (DMT) at La Sapienza University of Rome
Language: Jupyter Notebook - Size: 24.7 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

LM1997610/ADM_HW4
Homework_4 for Algorithmic Methods for Data Mining (ADM), MSc in Data Science at La Sapienza University of Rome
Language: Jupyter Notebook - Size: 3.65 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

vascoalramos/mpei 📦
Probability Methods for Informatics Engineering | UA 2018/2019
Language: Java - Size: 39.8 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

santurini/MinHash-LSH-From-Scratch
Implementing a simplified copy of Shazam application from scratch using MinHashing and LSH.
Language: Python - Size: 210 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 1

vbarzokas/apache-spark-link-prediction
A set of methods and model evaluation metrics for predicting links in an academic citation network using Apache Spark and Scala
Language: Scala - Size: 11.4 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

soulintzis/Multidimensional-Data-Structures
Language: Python - Size: 34.6 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 1

rkapsalis/Range-and-similarity-queries
Implementation of a B+ Tree for range and exact match queries and of the LSH algorithm for finding similar documents as measured by Jaccard Similarity.
Language: Python - Size: 111 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 2 - Forks: 0

tmpsrcrepo/benchmark_minhash_lsh
insight data engineering fellow project
Language: Python - Size: 1.38 MB - Last synced at: about 2 years ago - Pushed at: over 8 years ago - Stars: 10 - Forks: 3

steven-s/minhash-document-clusters
Minhash clustering of text documents
Language: Scala - Size: 33.2 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 4 - Forks: 1

kazemnejad/text_similarity_search
An easy-to-use script for fast similarity search in the textual data (and embedding space) with GPU & Multi-core support.
Language: Python - Size: 69.3 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 3 - Forks: 1

FilipeLopesPires/SpellChecker
SpellChecker: an application to check for spell errors.
Language: Java - Size: 3.54 MB - Last synced at: 3 months ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 1

coderthetyler/mhash-c
An implementation of the MinHashing algorithm in C using POSIX threads.
Language: C - Size: 3.86 MB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

cwuu/DataMining-LearningFromLargeDataSet-Task1
ETH Zurich Fall 2017
Language: Python - Size: 496 KB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 0

AiDinho/LocallySensitiveHashing
Language: Python - Size: 2.93 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0
