GitHub topics: lsh-algorithm
jianshu93/DartUniFrac
Approximate UniFrac via Weighted MinHash 🦀
Language: Rust - Size: 3.5 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 3 - Forks: 0
ritchie46/lsh-rs
Locality Sensitive Hashing in Rust with Python bindings
Language: Rust - Size: 511 KB - Last synced at: 20 days ago - Pushed at: over 2 years ago - Stars: 119 - Forks: 23
jianshu93/dartminhash-rs
Fast Sketching for Weighted Sets
Language: Rust - Size: 70 MB - Last synced at: 21 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0
alexrr04/document-similarity-detection-with-LSH
This project implements Locality-Sensitive Hashing (LSH) for efficient document similarity detection. Instead of performing exhaustive pairwise comparisons between documents, LSH uses probabilistic techniques to quickly identify similar document pairs, making it particularly effective for large document collections.
Language: C++ - Size: 7.99 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0
oertl/probminhash
ProbMinHash – A Class of Locality-Sensitive Hash Algorithms for the (Probability) Jaccard Similarity
Language: C++ - Size: 6.26 MB - Last synced at: about 2 months ago - Pushed at: about 5 years ago - Stars: 44 - Forks: 6
shaltielshmid/MinHashSharp
A Robust Library in C# for Similarity Estimation
Language: C# - Size: 39.1 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 1
oertl/treeminhash
TreeMinHash: Fast Sketching for Weighted Jaccard Similarity Estimation
Language: C++ - Size: 2.62 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 14 - Forks: 3
aidaLabDEI/MOMENTI-motifs
Scalable mining of multidimensional time series motifs.
Language: Python - Size: 66 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0
Infini-AI-Lab/MagicPIG
[ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation
Language: Python - Size: 56.8 MB - Last synced at: 4 months ago - Pushed at: 11 months ago - Stars: 227 - Forks: 16
guofei9987/pyLSHash
Locality Sensitive Hashing, fuzzy-hash, min-hash, simhash, aHash, pHash, dHash。基于 Hash值的图片相似度、文本相似度
Language: Python - Size: 257 KB - Last synced at: 4 months ago - Pushed at: almost 2 years ago - Stars: 60 - Forks: 6
FrancescoMonaco/span
Euclidean Minimum Spanning Tree approximation with a parameterless LSH index
Language: C++ - Size: 257 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0
eduardosantoshf/most-frequent-itemsets 📦
MDLE First Assignment - The objective of this project was to implement the A-Priori algorithm to obtain the most frequent itemsets for a list of conditions for a large set of patients, obtaining then associations between conditions by extracting some rules, and also to implement and apply LSH to identify similar news articles from a dataset.
Language: Jupyter Notebook - Size: 24.7 MB - Last synced at: 27 days ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0
ZiadSheriif/IntelliQuery
A semantic search indexing system designed to efficiently retrieve top matching results from a database of 20 million documents. Given the embedding of a search query, it quickly identifies and returns the most relevant documents
Language: Jupyter Notebook - Size: 5.84 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 6 - Forks: 4
akshatrajsaxena/Implementing-LSH
Implementation of LSH in order to find the similarity in a large dataset
Language: Jupyter Notebook - Size: 2.88 MB - Last synced at: 7 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0
Mrugank97/KNNavigate
Scaling Up Nearest Neighbor Search : How Dataset Size and Dimensionality Affect KNN Variants
Language: Jupyter Notebook - Size: 1.71 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0
MajaJuri/Analiza-velikih-skupova-podataka
Implementacija algoritama predstavljenih na predmetu Analiza velikih skupova podataka (AVSP)
Language: Java - Size: 1.03 MB - Last synced at: 7 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0
hugofpaiva/mpei-p1 📦
Trabalho Prático da UC de Métodos Probabilísticos para Engenharia Informática, UA 2019/2020
Language: Java - Size: 39.9 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 1
Lefteris-Souflas/Movie-Rating-User-Similarity
Explored Jaccard distance, Min-Hashing, and LSH for user similarity in a movie rating dataset. Tasks involve dataset preprocessing, exact Jaccard Similarity computation, Min-Hash signatures, and LSH implementation. Results and observations are documented in code, output files, and a report
Language: Jupyter Notebook - Size: 1.22 MB - Last synced at: 8 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0
justinfargnoli/lshforest
An implementation of LSH Forrest based off of the following paper (http://infolab.stanford.edu/~bawa/Pub/similarity.pdf).
Language: Go - Size: 29.3 KB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 7 - Forks: 1
DevPhamPham/NCKH_PySpark
Language: Python - Size: 354 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0
leiyunin/Locality-Sensitive-Hashing-and-Collaborative-Filtering-on-Yelp-Data
The assignment comprises two main tasks: implementing LSH to identify similar businesses based on user ratings and developing various collaborative filtering recommendation systems to predict user ratings for businesses.
Language: Python - Size: 6.84 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0
RishabhMaheshwary/query-attack
A Query Efficient Natural Language Attack in a Black Box Setting
Language: Python - Size: 1.67 MB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 16 - Forks: 4
Yasar2019/BigData-HW03
Finding similar documents using LSH with MapReduce on multi-node Spark Cluster
Language: Python - Size: 71 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0
Alexdruso/ID2222-Data-Mining-Sanvito-Stuart
Lab assignments for the course ID2222-Data Mining at KTH
Language: Roff - Size: 62.1 MB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 3 - Forks: 3
xadityax/Locality-Sensitive-Hashing-DNA-Seqs
Implementing Locality Sensitive Hashing for DNA Sequences.
Language: Python - Size: 1.77 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0
imRP21/Summer-Research-Internship-2022
This repo shows research paper upon which I worked during my summer research intern - 2022.
Size: 12.3 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0
Sitaras/Software-Development-for-Algorithmic-Problems_Project-2 Fork of giannhskp/Software-Development-for-Algorithmic-Problems_Project-2
📈|Time Series - Nearest neighbor search and Clustering using LSH, Hypercube (and Lloyd's only at the clustering) algorithms with metrics: L2, Discrete and Continuous Fréchet.
Language: C - Size: 33.9 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0
lehuutrung1412/ImageRetrieval
Build content-based image retrieval system using deep learning, applied some large scale similarity search technicals like Kdtree, LSH, Faiss.
Language: Python - Size: 4.58 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 8 - Forks: 3
AndreasTraut/Deep_learning_explorations
Example on the Local Sensitive Hashing (LSH) algorithm. Relevant for Big Data
Language: Jupyter Notebook - Size: 118 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0
NikolasGialitsis/LSH-and-Cube
LSH and Cube Implementation (Hashing and Querying Points on Higher Dimensions)
Language: C++ - Size: 8.26 MB - Last synced at: over 2 years ago - Pushed at: almost 7 years ago - Stars: 2 - Forks: 0
LM1997610/ADM_HW4
Homework_4 for Algorithmic Methods for Data Mining (ADM), MSc in Data Science at La Sapienza University of Rome
Language: Jupyter Notebook - Size: 3.65 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0
Sitaras/Software-Development-for-Algorithmic-Problems_Project-1
Vectors - Nearest neighbor search and Clustering using LSH, Hypercube (and Lloyd's only at the clustering) algorithms with L2 metric.
Language: C - Size: 15.8 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 1
ludwigfriborg/SwiftNilsimsa
Nilsimsa implementation as a swift package
Language: Swift - Size: 18.6 KB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0
AaronYang2333/DSCI_553
USC :v: 2020 Spring DSCI 553 (Foundations and Applications of Data Mining) 数据挖掘基础与应用 Score: :nine::four:
Language: ReScript - Size: 265 MB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 34 - Forks: 21
xiaogp/recsys_faiss
一个基于 fasttext + faiss 的商品内容相关推荐实现,nginx+uwsgi+flask / gunicorn+uvicorn+fastapi 提供api查询接口,增加Spark实现 Ansj+Word2vec+LSH+Phoenix
Language: Python - Size: 41.3 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 40 - Forks: 16
munnafaisal/Deep-Object-Search-With-Hash
Search your object with hash
Language: Python - Size: 10.2 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 12 - Forks: 5
kochlisGit/Big-Data-Algorithms
Implementation of algorithms for big data using python, numpy, pandas.
Language: Python - Size: 28.8 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 0
Vedant2311/Data-Mining-Algorithms
Repository for all assignments of the course COL761: Data Mining (Fall 2020), taught at IIT Delhi
Language: C++ - Size: 4.9 MB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0
C-Ritam98/SimToReal
Unnatural Language Processing
Language: Jupyter Notebook - Size: 518 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0
santurini/MinHash-LSH-From-Scratch
Implementing a simplified copy of Shazam application from scratch using MinHashing and LSH.
Language: Python - Size: 210 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 1
theatina/CryptoRecommendation
Recommendation System on cryptocurrency, using data collected from users' tweets + 10-Fold Cross Validation ( Based on the cryptocoins from each user's tweets, the program runs algorithms on the data, resulting in the recommendation of other cryptocoins for each user) ( readme in greek but soon to be translated in English )
Language: C - Size: 9.2 MB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 7 - Forks: 0
muyuuuu/high-performance-LSH
使用线程池的高并发 LSH 算法, C++ 实现
Language: C++ - Size: 47.9 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 0
SwamiKannan/Natural-Language-Processing-Specialization
Coursera's Natural Language Processing specialization
Language: HTML - Size: 3.68 MB - Last synced at: 8 months ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0
mark-antal-csizmadia/finding-similar-items-textually-similar-documents
Finding Similar Items: Textually Similar Documents
Language: Jupyter Notebook - Size: 451 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0
emad-deilam-salehi/Finding-Similar-Texts-using-LSH-Algorithm
Applied the LSH algorithm (developed from scratch) for finding similar texts.
Language: Jupyter Notebook - Size: 112 KB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0
MenesesGHZ/locality-sensitive-hashing
LSH algorithm made with C++
Language: Makefile - Size: 5.39 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0
pedroalbanese/lshsum
TTAK.KO-12.0276 LSH Recursive Hasher
Language: Go - Size: 23.4 KB - Last synced at: 9 months ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0
julialwang/docuSearch
a Python program that uses LSH (locality-sensitive hashing) to search and retrieve filenames from a csv file that contains similar words to the user's input.
Language: Python - Size: 91.8 KB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0
JaiJaveria/Data_Mining
Projects involving Frequent Itemset Mining and analysis of hierarchical space partitioning techniques
Language: HTML - Size: 203 MB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0
FilipeLopesPires/SpellChecker
SpellChecker: an application to check for spell errors.
Language: Java - Size: 3.54 MB - Last synced at: 6 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 1
AlessandraMonaco/Data-Mining
This repository contains simple and funny Data Mining projects in Python.
Language: Jupyter Notebook - Size: 7.96 MB - Last synced at: 5 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0
MohammadJavadArdestani/NLP-with-Classification-and-Vector-Spaces
Language: Jupyter Notebook - Size: 9.93 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1
spyros-briakos/Autoencoder-Dimensionality-Reduction
Autoencoder dimensionality reduction, EMD-Manhattan metrics comparison and classifier based clustering on MNIST dataset.
Language: C++ - Size: 16.2 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0