GitHub topics: shingling
Marcnuth/deduplication
Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.
Language: Python - Size: 22.5 KB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 18 - Forks: 6

VidyasagarMSC/shingling
Code for Shingling
Language: Python - Size: 11.7 KB - Last synced at: 2 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

xadityax/Locality-Sensitive-Hashing-DNA-Seqs
Implementing Locality Sensitive Hashing for DNA Sequences.
Language: Python - Size: 1.77 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

rigvedpatki/data-mining-assignment-1
Finding Similar Items: Textually Similar Documents
Language: TypeScript - Size: 267 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

andreicap/data-mining
Data Mining Algorithms
Language: Roff - Size: 24.7 MB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 1

kochlisGit/Big-Data-Algorithms
Implementation of algorithms for big data using python, numpy, pandas.
Language: Python - Size: 28.8 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 2 - Forks: 0

santurini/MinHash-LSH-From-Scratch
Implementing a simplified copy of Shazam application from scratch using MinHashing and LSH.
Language: Python - Size: 210 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 1

santurini/Search-Engine-Evaluation-and-Near-Duplicate-Detection
Exploiting the PyTerrier library to perform Search Engine Evaluation and Near Duplicate Detection on different datasets.
Language: Jupyter Notebook - Size: 267 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

mark-antal-csizmadia/finding-similar-items-textually-similar-documents
Finding Similar Items: Textually Similar Documents
Language: Jupyter Notebook - Size: 451 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

shr1611/Data-mining-Plagiarism-Check
A Java program to check Plagiarisms between multiple documents using the method of Shingling, MinHashing and Locality Sensitive Hashing.
Language: Java - Size: 38.1 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 0

fengxu1996/similarity_find
计算多个文本间相似度
Language: C++ - Size: 43.9 KB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 0

sameeravithana/D-Hoaxy
Duplicate Detection on Hoaxy Dataset
Language: Jupyter Notebook - Size: 3.41 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 1
