An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: minhash-lsh-algorithm

andrewmcloud/consimilo

A Clojure library for querying large data-sets on similarity

Language: Clojure - Size: 536 KB - Last synced at: 3 days ago - Pushed at: about 6 years ago - Stars: 63 - Forks: 4

adriacabeza/Document-similarity-detection-using-hashing

:page_with_curl:Document similarity detection using hashing

Language: TeX - Size: 16 MB - Last synced at: 3 months ago - Pushed at: over 6 years ago - Stars: 3 - Forks: 1

dynatrace-research/set-sketch-paper

SetSketch: Filling the Gap between MinHash and HyperLogLog

Language: C++ - Size: 23.7 MB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 46 - Forks: 5

amitkp57/dbms-correlated-columns-detection

Detecting correlated columns in DBMS systems using techniques like Pearson Correlation, LSH Minhashing and Random Sampling.

Language: Jupyter Notebook - Size: 594 KB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

stanford-futuredata/FAST

End-to-end earthquake detection pipeline via efficient time series similarity search

Language: Jupyter Notebook - Size: 158 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 142 - Forks: 56

emarkou/Text-Similarity

A text similarity computation using minhashing and Jaccard distance on reuters dataset

Language: R - Size: 69.3 KB - Last synced at: about 1 year ago - Pushed at: almost 7 years ago - Stars: 16 - Forks: 5

MaviVestini/ADM_HW4

4th homework for ADM

Language: Jupyter Notebook - Size: 1.27 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

AdrianaMacc/Covid-19-BigData-Project

SARS-COV-2 genome analysis using Big Data algorithms in order to find clusters of similar mutations that belongs to different clades which mutate together and generate the correspondent clade.

Language: Jupyter Notebook - Size: 513 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

MaviVestini/ADM-LT_HW1

First homework for the Advance Data Mining course

Language: HTML - Size: 5.91 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

christinebuckler/provider-prescriber

Language: Jupyter Notebook - Size: 30.5 MB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 1

shubhamwaghe/Scalable-Data-Mining

Scalable Data Mining - Assignment submissions

Language: Python - Size: 3.38 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 2 - Forks: 0

xadityax/Locality-Sensitive-Hashing-DNA-Seqs

Implementing Locality Sensitive Hashing for DNA Sequences.

Language: Python - Size: 1.77 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

aloobun/minhash_exp

Deduplication : minhash w/ LSH

Language: Python - Size: 3.91 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

gurushida/mnemophonix

A simple audio fingerprinting system

Language: C - Size: 316 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 25 - Forks: 4

rihenperry/csuci-mscs-thesis-dist-web-crawler

documents my master's level thesis work on building continous, topical web crawler based on mercator 1999

Language: TeX - Size: 27.4 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

micts/jss

Fast Jaccard similarity search for abstract sets (documents, products, users, etc.) using MinHashing and Locality Sensitve Hashing

Language: Python - Size: 23.4 KB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 3 - Forks: 0

SpydazWebAI-NLP/SpydazWebAI_NLP_Models

Word/Image/Audio Embedding models, Tokenizer models, Ngram language models, MatrixModels, Corpus building, Vocabulary Building, Language modelling

Language: Visual Basic .NET - Size: 2.93 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

92amartins/minhash-example

MinHash Example

Language: Jupyter Notebook - Size: 2.93 KB - Last synced at: almost 2 years ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 0

mariofv/DocSim

Minhash text analyzer developed during Algorithmics subject.

Language: C++ - Size: 43.1 MB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 1

Cheng-Lin-Li/Spark

There are Python 2.7 codes and learning notes for Spark 2.1.1

Language: Python - Size: 2.62 MB - Last synced at: about 1 month ago - Pushed at: over 6 years ago - Stars: 24 - Forks: 6

wherefortravel/minhash-node-rs

MinHash and LSH index written in Rust for Node.js

Language: Rust - Size: 207 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 12 - Forks: 1

LM1997610/Data-Mining

Homeworks for Advanced Data Mining and Language Technology (DMT) at La Sapienza University of Rome

Language: Jupyter Notebook - Size: 24.7 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

LM1997610/ADM_HW4

Homework_4 for Algorithmic Methods for Data Mining (ADM), MSc in Data Science at La Sapienza University of Rome

Language: Jupyter Notebook - Size: 3.65 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

vascoalramos/mpei 📦

Probability Methods for Informatics Engineering | UA 2018/2019

Language: Java - Size: 39.8 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

santurini/MinHash-LSH-From-Scratch

Implementing a simplified copy of Shazam application from scratch using MinHashing and LSH.

Language: Python - Size: 210 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 1

vbarzokas/apache-spark-link-prediction

A set of methods and model evaluation metrics for predicting links in an academic citation network using Apache Spark and Scala

Language: Scala - Size: 11.4 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

soulintzis/Multidimensional-Data-Structures

Language: Python - Size: 34.6 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 1

rkapsalis/Range-and-similarity-queries

Implementation of a B+ Tree for range and exact match queries and of the LSH algorithm for finding similar documents as measured by Jaccard Similarity.

Language: Python - Size: 111 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 2 - Forks: 0

tmpsrcrepo/benchmark_minhash_lsh

insight data engineering fellow project

Language: Python - Size: 1.38 MB - Last synced at: about 2 years ago - Pushed at: over 8 years ago - Stars: 10 - Forks: 3

steven-s/minhash-document-clusters

Minhash clustering of text documents

Language: Scala - Size: 33.2 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 4 - Forks: 1

kazemnejad/text_similarity_search

An easy-to-use script for fast similarity search in the textual data (and embedding space) with GPU & Multi-core support.

Language: Python - Size: 69.3 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 3 - Forks: 1

FilipeLopesPires/SpellChecker

SpellChecker: an application to check for spell errors.

Language: Java - Size: 3.54 MB - Last synced at: 3 months ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 1

coderthetyler/mhash-c

An implementation of the MinHashing algorithm in C using POSIX threads.

Language: C - Size: 3.86 MB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

cwuu/DataMining-LearningFromLargeDataSet-Task1

ETH Zurich Fall 2017

Language: Python - Size: 496 KB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 0

AiDinho/LocallySensitiveHashing

Language: Python - Size: 2.93 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

Related Keywords
minhash-lsh-algorithm 35 minhash 13 jaccard-similarity 10 locality-sensitive-hashing 9 lsh 7 python 4 lsh-algorithm 4 algorithm 3 similarity-search 3 cosine-similarity 3 hashing 3 apriori-algorithm 2 jaccard-similarity-estimation 2 tf-idf 2 shingling 2 spark 2 indexing 2 similarity 2 bloom-filter 2 scala 2 apache-spark 2 savasere-omiecinski-and-navathe 1 python27 1 map-reduce 1 kmeans-clustering 1 kmeans 1 uv-decomposition 1 neon-bindings 1 node 1 node-js 1 node-module 1 nodejs 1 rust 1 clojure 1 bpe 1 cooccurrence 1 embeddings 1 image2vec 1 latent-dirichlet-allocation 1 matrix 1 mutual-information 1 ngram-language-model 1 tokenization 1 tokenizer 1 vocabulary-builder 1 word2vec 1 word2word-matrix 1 wordgrams 1 wordpeice 1 wordpiece-tokenization 1 text-analysis 1 als 1 alternating-least-squares 1 apriori-son 1 data-mining 1 membership-queries 1 multidimensional 1 bplustree 1 batch 1 spark-streaming 1 text-processing 1 clustering 1 document-clustering 1 text-mining 1 faiss 1 text-simi 1 java 1 murmur 1 spell-checker 1 word-suggestion 1 c-library 1 posix-threads 1 datamining 1 large-dataset 1 mapreduce 1 data-science 1 locally-sensitive-hashing 1 machine-learning 1 svd-matrix-factorisation 1 dimensionality-reduction 1 k-means-clustering 1 pca-analysis 1 counting-bloom-filter 1 hash-functions 1 probabilistic-programming 1 probability-distribution 1 random-number-generators 1 random-variables 1 jupyter-notebook 1 librosa 1 citation-network 1 prediction-algorithm 1 bloom-filters 1 bplus-tree 1 data-structures 1 minwise-hashing 1 minwise-hashing-algorithm 1 sketch 1 sketch-algorithm 1 sketch-data-structures 1