An open API service providing repository metadata for many open source software ecosystems.

Topic: "minhash-lsh-algorithm"

stanford-futuredata/FAST

End-to-end earthquake detection pipeline via efficient time series similarity search

Language: Jupyter Notebook - Size: 158 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 142 - Forks: 56

andrewmcloud/consimilo

A Clojure library for querying large data-sets on similarity

Language: Clojure - Size: 536 KB - Last synced at: 22 days ago - Pushed at: over 6 years ago - Stars: 65 - Forks: 4

dynatrace-research/set-sketch-paper

SetSketch: Filling the Gap between MinHash and HyperLogLog

Language: C++ - Size: 23.7 MB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 46 - Forks: 5

gurushida/mnemophonix

A simple audio fingerprinting system

Language: C - Size: 316 KB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 25 - Forks: 4

Cheng-Lin-Li/Spark

There are Python 2.7 codes and learning notes for Spark 2.1.1

Language: Python - Size: 2.62 MB - Last synced at: 5 months ago - Pushed at: about 7 years ago - Stars: 24 - Forks: 6

emarkou/Text-Similarity

A text similarity computation using minhashing and Jaccard distance on reuters dataset

Language: R - Size: 69.3 KB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 16 - Forks: 5

wherefortravel/minhash-node-rs

MinHash and LSH index written in Rust for Node.js

Language: Rust - Size: 207 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 12 - Forks: 1

tmpsrcrepo/benchmark_minhash_lsh

insight data engineering fellow project

Language: Python - Size: 1.38 MB - Last synced at: over 2 years ago - Pushed at: almost 9 years ago - Stars: 10 - Forks: 3

steven-s/minhash-document-clusters

Minhash clustering of text documents

Language: Scala - Size: 33.2 KB - Last synced at: over 2 years ago - Pushed at: almost 8 years ago - Stars: 4 - Forks: 1

micts/jss

Fast Jaccard similarity search for abstract sets (documents, products, users, etc.) using MinHashing and Locality Sensitve Hashing

Language: Python - Size: 23.4 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 3 - Forks: 0

kazemnejad/text_similarity_search

An easy-to-use script for fast similarity search in the textual data (and embedding space) with GPU & Multi-core support.

Language: Python - Size: 69.3 KB - Last synced at: over 2 years ago - Pushed at: about 6 years ago - Stars: 3 - Forks: 1

adriacabeza/Document-similarity-detection-using-hashing

:page_with_curl:Document similarity detection using hashing

Language: TeX - Size: 16 MB - Last synced at: 7 months ago - Pushed at: over 6 years ago - Stars: 3 - Forks: 1

rkapsalis/Range-and-similarity-queries

Implementation of a B+ Tree for range and exact match queries and of the LSH algorithm for finding similar documents as measured by Jaccard Similarity.

Language: Python - Size: 111 KB - Last synced at: 4 months ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

vbarzokas/apache-spark-link-prediction

A set of methods and model evaluation metrics for predicting links in an academic citation network using Apache Spark and Scala

Language: Scala - Size: 11.4 MB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 0

soulintzis/Multidimensional-Data-Structures

Language: Python - Size: 34.6 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 1

shubhamwaghe/Scalable-Data-Mining

Scalable Data Mining - Assignment submissions

Language: Python - Size: 3.38 MB - Last synced at: almost 2 years ago - Pushed at: almost 8 years ago - Stars: 2 - Forks: 0

amulya-jayanti/Amazon-Reviews-Analysis

Amazon Reviews Analysis using Big Data Techniques

Language: Jupyter Notebook - Size: 838 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 1 - Forks: 0

Sid-CodeX/VeriWrite

VeriWrite — A web-based intelligent plagiarism detection system for handwritten academic submissions. Uses OCR and Jaccard similarity to analyze and compare student documents efficiently.

Language: TypeScript - Size: 33.5 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

SpydazWebAI-NLP/SpydazWebAI_NLP_Models

Word/Image/Audio Embedding models, Tokenizer models, Ngram language models, MatrixModels, Corpus building, Vocabulary Building, Language modelling

Language: Visual Basic .NET - Size: 2.93 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

amitkp57/dbms-correlated-columns-detection

Detecting correlated columns in DBMS systems using techniques like Pearson Correlation, LSH Minhashing and Random Sampling.

Language: Jupyter Notebook - Size: 594 KB - Last synced at: 4 months ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

rihenperry/csuci-mscs-thesis-dist-web-crawler

documents my master's level thesis work on building continous, topical web crawler based on mercator 1999

Language: TeX - Size: 27.4 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

christinebuckler/provider-prescriber

Language: Jupyter Notebook - Size: 30.5 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 1

92amartins/minhash-example

MinHash Example

Language: Jupyter Notebook - Size: 2.93 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 0

AdrianaMacc/Covid-19-BigData-Project

SARS-COV-2 genome analysis using Big Data algorithms in order to find clusters of similar mutations that belongs to different clades which mutate together and generate the correspondent clade.

Language: Jupyter Notebook - Size: 513 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

MaviVestini/ADM-LT_HW1

First homework for the Advance Data Mining course

Language: HTML - Size: 5.91 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

aloobun/minhash_exp

Deduplication : minhash w/ LSH

Language: Python - Size: 3.91 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

LM1997610/Data-Mining

Homeworks for Advanced Data Mining and Language Technology (DMT) at La Sapienza University of Rome

Language: Jupyter Notebook - Size: 24.7 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

LM1997610/ADM_HW4

Homework_4 for Algorithmic Methods for Data Mining (ADM), MSc in Data Science at La Sapienza University of Rome

Language: Jupyter Notebook - Size: 3.65 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

MaviVestini/ADM_HW4

4th homework for ADM

Language: Jupyter Notebook - Size: 1.27 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

santurini/MinHash-LSH-From-Scratch

Implementing a simplified copy of Shazam application from scratch using MinHashing and LSH.

Language: Python - Size: 210 KB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 1

FilipeLopesPires/SpellChecker

SpellChecker: an application to check for spell errors.

Language: Java - Size: 3.54 MB - Last synced at: 4 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 1

xadityax/Locality-Sensitive-Hashing-DNA-Seqs

Implementing Locality Sensitive Hashing for DNA Sequences.

Language: Python - Size: 1.77 MB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

vascoalramos/mpei 📦

Probability Methods for Informatics Engineering | UA 2018/2019

Language: Java - Size: 39.8 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

coderthetyler/mhash-c

An implementation of the MinHashing algorithm in C using POSIX threads.

Language: C - Size: 3.86 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

cwuu/DataMining-LearningFromLargeDataSet-Task1

ETH Zurich Fall 2017

Language: Python - Size: 496 KB - Last synced at: over 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

AiDinho/LocallySensitiveHashing

Language: Python - Size: 2.93 KB - Last synced at: over 2 years ago - Pushed at: almost 8 years ago - Stars: 0 - Forks: 0

mariofv/DocSim

Minhash text analyzer developed during Algorithmics subject.

Language: C++ - Size: 43.1 MB - Last synced at: about 2 years ago - Pushed at: almost 8 years ago - Stars: 0 - Forks: 1

Related Topics
minhash 13 jaccard-similarity 11 locality-sensitive-hashing 9 lsh 7 lsh-algorithm 4 python 4 algorithm 3 similarity-search 3 cosine-similarity 3 hashing 3 apriori-algorithm 2 similarity 2 indexing 2 jaccard-similarity-estimation 2 spark 2 shingling 2 apache-spark 2 pyspark 2 bloom-filter 2 tf-idf 2 scala 2 sketch-algorithm 1 sketch-data-structures 1 text-analysis 1 dna-sequences 1 search-engine 1 bloom-filters 1 serpapi 1 bplus-tree 1 data-structures 1 near-duplicate-detection 1 membership-queries 1 multidimensional 1 random-variables 1 amazon-web-services 1 big-data 1 feature-engineering 1 parser 1 python3 1 regex 1 unsupervised-learning 1 cardinality-estimation 1 estimation 1 hyperloglog 1 hyperloglog-sketches 1 inclusion-exclusion 1 intersection 1 jaccard 1 minhash-similarity 1 minhash-sketches 1 minwise-hashing 1 minwise-hashing-algorithm 1 sketch 1 hadoop-mapreduce 1 als 1 alternating-least-squares 1 exploratory-data-analysis 1 apriori-son 1 kmeans 1 kmeans-clustering 1 map-reduce 1 data-visualization 1 python27 1 savasere-omiecinski-and-navathe 1 uv-decomposition 1 clojure 1 collaborative-filtering 1 cosine-distance 1 data-sketches 1 data-sketching 1 document-similarity 1 hamming-distance 1 lsh-forest 1 plagiarism-detection 1 recommender-system 1 pearson-correlation 1 random-sampling 1 microsoft-ocr 1 c-library 1 posix-threads 1 batch 1 spark-streaming 1 text-processing 1 earthquakes 1 time-series 1 mern-stack 1 apriori-algorithm-python 1 big-data-analytics 1 bigdata 1 covid-19 1 genome-analysis 1 copies 1 java 1 murmur 1 spell-checker 1 word-suggestion 1 duplicate-detection 1 event-driven 1 mercator 1 queueing 1