An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: lsh-algorithm

jianshu93/DartUniFrac

Approximate UniFrac via Weighted MinHash 🦀

Language: Rust - Size: 3.5 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 3 - Forks: 0

ritchie46/lsh-rs

Locality Sensitive Hashing in Rust with Python bindings

Language: Rust - Size: 511 KB - Last synced at: 20 days ago - Pushed at: over 2 years ago - Stars: 119 - Forks: 23

jianshu93/dartminhash-rs

Fast Sketching for Weighted Sets

Language: Rust - Size: 70 MB - Last synced at: 21 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

alexrr04/document-similarity-detection-with-LSH

This project implements Locality-Sensitive Hashing (LSH) for efficient document similarity detection. Instead of performing exhaustive pairwise comparisons between documents, LSH uses probabilistic techniques to quickly identify similar document pairs, making it particularly effective for large document collections.

Language: C++ - Size: 7.99 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

oertl/probminhash

ProbMinHash – A Class of Locality-Sensitive Hash Algorithms for the (Probability) Jaccard Similarity

Language: C++ - Size: 6.26 MB - Last synced at: about 2 months ago - Pushed at: about 5 years ago - Stars: 44 - Forks: 6

shaltielshmid/MinHashSharp

A Robust Library in C# for Similarity Estimation

Language: C# - Size: 39.1 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 1

oertl/treeminhash

TreeMinHash: Fast Sketching for Weighted Jaccard Similarity Estimation

Language: C++ - Size: 2.62 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 14 - Forks: 3

aidaLabDEI/MOMENTI-motifs

Scalable mining of multidimensional time series motifs.

Language: Python - Size: 66 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

Infini-AI-Lab/MagicPIG

[ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation

Language: Python - Size: 56.8 MB - Last synced at: 4 months ago - Pushed at: 11 months ago - Stars: 227 - Forks: 16

guofei9987/pyLSHash

Locality Sensitive Hashing, fuzzy-hash, min-hash, simhash, aHash, pHash, dHash。基于 Hash值的图片相似度、文本相似度

Language: Python - Size: 257 KB - Last synced at: 4 months ago - Pushed at: almost 2 years ago - Stars: 60 - Forks: 6

FrancescoMonaco/span

Euclidean Minimum Spanning Tree approximation with a parameterless LSH index

Language: C++ - Size: 257 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

eduardosantoshf/most-frequent-itemsets 📦

MDLE First Assignment - The objective of this project was to implement the A-Priori algorithm to obtain the most frequent itemsets for a list of conditions for a large set of patients, obtaining then associations between conditions by extracting some rules, and also to implement and apply LSH to identify similar news articles from a dataset.

Language: Jupyter Notebook - Size: 24.7 MB - Last synced at: 27 days ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

ZiadSheriif/IntelliQuery

A semantic search indexing system designed to efficiently retrieve top matching results from a database of 20 million documents. Given the embedding of a search query, it quickly identifies and returns the most relevant documents

Language: Jupyter Notebook - Size: 5.84 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 6 - Forks: 4

akshatrajsaxena/Implementing-LSH

Implementation of LSH in order to find the similarity in a large dataset

Language: Jupyter Notebook - Size: 2.88 MB - Last synced at: 7 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

Mrugank97/KNNavigate

Scaling Up Nearest Neighbor Search : How Dataset Size and Dimensionality Affect KNN Variants

Language: Jupyter Notebook - Size: 1.71 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

MajaJuri/Analiza-velikih-skupova-podataka

Implementacija algoritama predstavljenih na predmetu Analiza velikih skupova podataka (AVSP)

Language: Java - Size: 1.03 MB - Last synced at: 7 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

hugofpaiva/mpei-p1 📦

Trabalho Prático da UC de Métodos Probabilísticos para Engenharia Informática, UA 2019/2020

Language: Java - Size: 39.9 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 1

Lefteris-Souflas/Movie-Rating-User-Similarity

Explored Jaccard distance, Min-Hashing, and LSH for user similarity in a movie rating dataset. Tasks involve dataset preprocessing, exact Jaccard Similarity computation, Min-Hash signatures, and LSH implementation. Results and observations are documented in code, output files, and a report

Language: Jupyter Notebook - Size: 1.22 MB - Last synced at: 8 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

justinfargnoli/lshforest

An implementation of LSH Forrest based off of the following paper (http://infolab.stanford.edu/~bawa/Pub/similarity.pdf).

Language: Go - Size: 29.3 KB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 7 - Forks: 1

DevPhamPham/NCKH_PySpark

Language: Python - Size: 354 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

leiyunin/Locality-Sensitive-Hashing-and-Collaborative-Filtering-on-Yelp-Data

The assignment comprises two main tasks: implementing LSH to identify similar businesses based on user ratings and developing various collaborative filtering recommendation systems to predict user ratings for businesses.

Language: Python - Size: 6.84 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

RishabhMaheshwary/query-attack

A Query Efficient Natural Language Attack in a Black Box Setting

Language: Python - Size: 1.67 MB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 16 - Forks: 4

Yasar2019/BigData-HW03

Finding similar documents using LSH with MapReduce on multi-node Spark Cluster

Language: Python - Size: 71 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Alexdruso/ID2222-Data-Mining-Sanvito-Stuart

Lab assignments for the course ID2222-Data Mining at KTH

Language: Roff - Size: 62.1 MB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 3 - Forks: 3

xadityax/Locality-Sensitive-Hashing-DNA-Seqs

Implementing Locality Sensitive Hashing for DNA Sequences.

Language: Python - Size: 1.77 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

imRP21/Summer-Research-Internship-2022

This repo shows research paper upon which I worked during my summer research intern - 2022.

Size: 12.3 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

Sitaras/Software-Development-for-Algorithmic-Problems_Project-2 Fork of giannhskp/Software-Development-for-Algorithmic-Problems_Project-2

📈|Time Series - Nearest neighbor search and Clustering using LSH, Hypercube (and Lloyd's only at the clustering) algorithms with metrics: L2, Discrete and Continuous Fréchet.

Language: C - Size: 33.9 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

lehuutrung1412/ImageRetrieval

Build content-based image retrieval system using deep learning, applied some large scale similarity search technicals like Kdtree, LSH, Faiss.

Language: Python - Size: 4.58 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 8 - Forks: 3

AndreasTraut/Deep_learning_explorations

Example on the Local Sensitive Hashing (LSH) algorithm. Relevant for Big Data

Language: Jupyter Notebook - Size: 118 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

NikolasGialitsis/LSH-and-Cube

LSH and Cube Implementation (Hashing and Querying Points on Higher Dimensions)

Language: C++ - Size: 8.26 MB - Last synced at: over 2 years ago - Pushed at: almost 7 years ago - Stars: 2 - Forks: 0

LM1997610/ADM_HW4

Homework_4 for Algorithmic Methods for Data Mining (ADM), MSc in Data Science at La Sapienza University of Rome

Language: Jupyter Notebook - Size: 3.65 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Sitaras/Software-Development-for-Algorithmic-Problems_Project-1

Vectors - Nearest neighbor search and Clustering using LSH, Hypercube (and Lloyd's only at the clustering) algorithms with L2 metric.

Language: C - Size: 15.8 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 1

ludwigfriborg/SwiftNilsimsa

Nilsimsa implementation as a swift package

Language: Swift - Size: 18.6 KB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

AaronYang2333/DSCI_553

USC :v: 2020 Spring DSCI 553 (Foundations and Applications of Data Mining) 数据挖掘基础与应用 Score: :nine::four:

Language: ReScript - Size: 265 MB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 34 - Forks: 21

xiaogp/recsys_faiss

一个基于 fasttext + faiss 的商品内容相关推荐实现,nginx+uwsgi+flask / gunicorn+uvicorn+fastapi 提供api查询接口,增加Spark实现 Ansj+Word2vec+LSH+Phoenix

Language: Python - Size: 41.3 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 40 - Forks: 16

munnafaisal/Deep-Object-Search-With-Hash

Search your object with hash

Language: Python - Size: 10.2 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 12 - Forks: 5

kochlisGit/Big-Data-Algorithms

Implementation of algorithms for big data using python, numpy, pandas.

Language: Python - Size: 28.8 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 0

Vedant2311/Data-Mining-Algorithms

Repository for all assignments of the course COL761: Data Mining (Fall 2020), taught at IIT Delhi

Language: C++ - Size: 4.9 MB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

C-Ritam98/SimToReal

Unnatural Language Processing

Language: Jupyter Notebook - Size: 518 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

santurini/MinHash-LSH-From-Scratch

Implementing a simplified copy of Shazam application from scratch using MinHashing and LSH.

Language: Python - Size: 210 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 1

theatina/CryptoRecommendation

Recommendation System on cryptocurrency, using data collected from users' tweets + 10-Fold Cross Validation ( Based on the cryptocoins from each user's tweets, the program runs algorithms on the data, resulting in the recommendation of other cryptocoins for each user) ( readme in greek but soon to be translated in English )

Language: C - Size: 9.2 MB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 7 - Forks: 0

muyuuuu/high-performance-LSH

使用线程池的高并发 LSH 算法, C++ 实现

Language: C++ - Size: 47.9 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 0

SwamiKannan/Natural-Language-Processing-Specialization

Coursera's Natural Language Processing specialization

Language: HTML - Size: 3.68 MB - Last synced at: 8 months ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

mark-antal-csizmadia/finding-similar-items-textually-similar-documents

Finding Similar Items: Textually Similar Documents

Language: Jupyter Notebook - Size: 451 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

emad-deilam-salehi/Finding-Similar-Texts-using-LSH-Algorithm

Applied the LSH algorithm (developed from scratch) for finding similar texts.

Language: Jupyter Notebook - Size: 112 KB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

MenesesGHZ/locality-sensitive-hashing

LSH algorithm made with C++

Language: Makefile - Size: 5.39 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

pedroalbanese/lshsum

TTAK.KO-12.0276 LSH Recursive Hasher

Language: Go - Size: 23.4 KB - Last synced at: 9 months ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

julialwang/docuSearch

a Python program that uses LSH (locality-sensitive hashing) to search and retrieve filenames from a csv file that contains similar words to the user's input.

Language: Python - Size: 91.8 KB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

JaiJaveria/Data_Mining

Projects involving Frequent Itemset Mining and analysis of hierarchical space partitioning techniques

Language: HTML - Size: 203 MB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

FilipeLopesPires/SpellChecker

SpellChecker: an application to check for spell errors.

Language: Java - Size: 3.54 MB - Last synced at: 6 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 1

AlessandraMonaco/Data-Mining

This repository contains simple and funny Data Mining projects in Python.

Language: Jupyter Notebook - Size: 7.96 MB - Last synced at: 5 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

MohammadJavadArdestani/NLP-with-Classification-and-Vector-Spaces

Language: Jupyter Notebook - Size: 9.93 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

spyros-briakos/Autoencoder-Dimensionality-Reduction

Autoencoder dimensionality reduction, EMD-Manhattan metrics comparison and classifier based clustering on MNIST dataset.

Language: C++ - Size: 16.2 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

Related Keywords
lsh-algorithm 53 lsh 10 minhash 9 lsh-implementation 8 locality-sensitive-hashing 8 clustering 6 data-mining 5 jaccard-similarity 5 hashing 4 minhash-lsh-algorithm 4 shingling 4 python 4 pyspark 3 hypercube 3 cosine-similarity 3 pcy 3 bloom-filter 3 similarity-search 3 apriori-algorithm 3 similarity 2 nearest-neighbors 2 jaccard-similarity-estimation 2 weighted-sets 2 min-hashing 2 recommendation-system 2 similar-items 2 spark 2 frequent-itemset-mining 2 cpp 2 data-science 2 weighted-jaccard 2 clustering-algorithm 2 r-tree 2 fp-tree 2 hash-algorithm 2 jaccard 2 jaccard-distance 2 jaccard-index 2 locality-sensitive 2 faiss 2 dimensionality-reduction 2 minwise-hashing 2 similarity-metric 2 sketching 2 big-data-processing 1 stochastic-gradient-descent 1 frequent-itemsets 1 min-hasing 1 gaston 1 fsg 1 multihash-pcy 1 multistage-pcy 1 fp-tree-c-implementation 1 stream-mining 1 streams 1 pca-analysis 1 cluster 1 clusters 1 kmeansplusplus 1 lloyds 1 range-search 1 vectors 1 nilsimsa 1 packages 1 dsci553 1 inf553 1 streaming 1 usc 1 fasttext 1 recommender-system 1 word2vec 1 deeplearning 1 object-detection 1 object-search 1 search-engine 1 yolov3 1 a-priori 1 viterbi-algorithm 1 word2vec-algorithm 1 textual-similarity 1 hash-functions 1 text-similarity 1 java 1 murmur 1 spell-checker 1 word-suggestion 1 bert-model 1 feature-engineering 1 inverted-index 1 lstm-neural-networks 1 logistic-regression 1 naive-bayes-classifier 1 text-process 1 transformation-matrix 1 tweet-analysis 1 approximate-nearest-neighbor-search 1 autoencoder 1 bottleneck 1 earth-movers-distance 1 manhattan-distance 1