Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: simhash
87owo/PYAS
Python Antivirus Software
Language: Python - Size: 933 MB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 116 - Forks: 16
MajaJuri/Analiza-velikih-skupova-podataka
Implementacija algoritama predstavljenih na predmetu Analiza velikih skupova podataka (AVSP)
Language: Java - Size: 1.03 MB - Last synced: 16 days ago - Pushed: 16 days ago - Stars: 0 - Forks: 0
serega/gaoya
Locality Sensitive Hashing
Language: Rust - Size: 236 KB - Last synced: 8 days ago - Pushed: 11 months ago - Stars: 49 - Forks: 4
zyocum/dedup
Find duplicate text files.
Language: Python - Size: 19.5 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 11 - Forks: 2
dynatrace-oss/hash4j
Dynatrace hash library for Java
Language: Java - Size: 40.4 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 73 - Forks: 9
Marcnuth/deduplication
Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.
Language: Python - Size: 22.5 KB - Last synced: 13 days ago - Pushed: 9 months ago - Stars: 16 - Forks: 6
hybridtheory/floc-simhash
A fast python implementation of the SimHash algorithm.
Language: Python - Size: 27.3 KB - Last synced: 26 days ago - Pushed: over 2 years ago - Stars: 27 - Forks: 7
mgunn001/tmvis Fork of oduwsdl/tmvis
A Research Project Thumbnail Visualization to summarize the webpage changes over time
Language: JavaScript - Size: 5.57 MB - Last synced: about 2 months ago - Pushed: over 5 years ago - Stars: 1 - Forks: 2
sean-public/python-hashes
Interesting (non-cryptographic) hashes implemented in pure Python.
Language: Python - Size: 29.3 KB - Last synced: about 1 month ago - Pushed: almost 3 years ago - Stars: 238 - Forks: 43
james-bowman/nlp
Selected Machine Learning algorithms for natural language processing and semantic analysis in Golang
Language: Go - Size: 396 KB - Last synced: about 1 month ago - Pushed: about 3 years ago - Stars: 431 - Forks: 45
tomfran/crawler
A web crawler written in Rust
Language: Rust - Size: 3.64 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0
bbalet/stopwords
Removes most frequent words (stop words) from a text content. Based on a Curated list of language statistics.
Language: Go - Size: 89.8 KB - Last synced: about 2 months ago - Pushed: 11 months ago - Stars: 134 - Forks: 25
php-lsys/simhash
simhash for php extension : 判断文本相似度
Language: C - Size: 19.5 KB - Last synced: about 2 months ago - Pushed: about 2 years ago - Stars: 2 - Forks: 2
NETkiddy/simhash_similarity
A text similarity by simhash
Language: Go - Size: 6.84 KB - Last synced: 3 months ago - Pushed: over 5 years ago - Stars: 21 - Forks: 9
preciz/similarity
A library for cosine similarity & simhash calculation
Language: Elixir - Size: 53.7 KB - Last synced: 18 days ago - Pushed: over 1 year ago - Stars: 15 - Forks: 2
XAH30/LSH-vs-Finesse
In this repository you can find an implementation of LSH (Local | Sensitive Hashing) and Finesse algorithms, designed to find similar data based on their hashes
Language: C++ - Size: 5.72 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 1 - Forks: 0
Xenia101/KeyStroke-Dynamics
⌨️ User Verification based on Keystroke Dynamics / Two-factor Authentication technology based on Key-Stroke
Language: Python - Size: 548 KB - Last synced: about 1 month ago - Pushed: about 2 years ago - Stars: 3 - Forks: 2
oduwsdl/off-topic-memento-toolkit
This system evaluates a collection of mementos (archived web pages) to determine which are off topic. The collection can be part of an Archive-It collection, a single TimeMap, or stored in a WARC file.
Language: Python - Size: 93.7 MB - Last synced: 5 days ago - Pushed: over 2 years ago - Stars: 8 - Forks: 4
fturati/floc-minhash-attacks
Implementation for the attacks of the paper "Locality-Sensitive Hashing Does Not Guarantee Privacy! Attacks on Google's FLoC and the MinHash Hierarchy System"
Language: Python - Size: 17 MB - Last synced: 5 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 0
dbrcina/AVSP-FER-2020-21
Lab solutions for Analysis of Massive Datasets ("Analiza velikih skupova podataka") course at FER 2020/21
Language: Java - Size: 1.32 MB - Last synced: 5 months ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0
sing1ee/simhash-java
A simple implementation of simhash algorithm by java.
Language: Java - Size: 1.52 MB - Last synced: 7 months ago - Pushed: over 3 years ago - Stars: 153 - Forks: 84
nepiskopos/duplicate-questions-detection-lsh
Knowledge extraction through Data Analysis, including Locality Sensitive Hashing (LSH).
Language: Jupyter Notebook - Size: 423 KB - Last synced: 8 months ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0
nnnet/superminhash
SuperMinHash: A New Minwise Hashing Algorithm for Jaccard Similarity Estimation, Simhash and SimhashIndex
Language: Python - Size: 19.5 KB - Last synced: 8 months ago - Pushed: over 1 year ago - Stars: 19 - Forks: 7
rihenperry/csuci-mscs-thesis-dist-web-crawler
documents my master's level thesis work on building continous, topical web crawler based on mercator 1999
Language: TeX - Size: 27.4 MB - Last synced: about 2 months ago - Pushed: over 4 years ago - Stars: 1 - Forks: 0
holsee/spirit_fingers
Elixir SimHash NIFs written in Rust
Language: Elixir - Size: 3.36 MB - Last synced: 17 days ago - Pushed: over 1 year ago - Stars: 18 - Forks: 1
sskender/analysis-of-massive-datasets
Analysis of Massive Datasets FER labs
Language: Python - Size: 19 MB - Last synced: 10 months ago - Pushed: almost 2 years ago - Stars: 1 - Forks: 0
manmolecular/history-fp
:feet: Create a behavioral fingerprint based on your zsh command line history
Language: Python - Size: 6.84 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 1 - Forks: 0
fpopic/avsp
(Class) Big Data Analysis Course Assignments
Language: Java - Size: 28.2 MB - Last synced: 10 months ago - Pushed: about 7 years ago - Stars: 0 - Forks: 0
justinfargnoli/simhash
A barebones implementation of the simhash data sketching algorithm.
Language: Go - Size: 7.81 KB - Last synced: 11 months ago - Pushed: almost 3 years ago - Stars: 1 - Forks: 0
privacy-lsh/floc-minhash
Implementation for the attacks of the paper "Locality-Sensitive Hashing Does Not Guarantee Privacy! Attacks on Google's FLoC and the MinHash Hierarchy System".
Language: Python - Size: 16.9 MB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0
shenwei356/simhash-eval
Language: Go - Size: 2.91 MB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 2 - Forks: 1
ALuShu/checksystem
基于simHash的Web作业查重系统
Language: JavaScript - Size: 4.42 MB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 8 - Forks: 0
mokeeqian/copydetector
基于springboot和Google开源simhash算法实现的作业查重/抄袭检测/文本相似度分析可视化系统,,集成jplag、MOSS、singleCloud工具套件进行多方位查重 Ref: https://github.com/ALuShu/checksystem
Language: JavaScript - Size: 71.6 MB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 11 - Forks: 2
Qyokizzzz/simhash
The extended version of simhash supports fingerprint extraction of documents and images.
Language: Python - Size: 551 KB - Last synced: 9 months ago - Pushed: almost 2 years ago - Stars: 1 - Forks: 0
armchairtheorist/simhash2
A rewrite of Bookmate's simhash gem, which is an implementation of Moses Charikar's simhashes in Ruby.
Language: Ruby - Size: 27.3 KB - Last synced: 3 days ago - Pushed: over 5 years ago - Stars: 12 - Forks: 3
vkandy/simhash-js
Simhash implementation in Javascript
Language: JavaScript - Size: 49.8 KB - Last synced: about 1 year ago - Pushed: almost 7 years ago - Stars: 37 - Forks: 15
liuaiting/Financial-News-Analysis
招商银行FinTech-复赛-财经新闻分析
Language: Python - Size: 86.9 MB - Last synced: about 1 year ago - Pushed: about 4 years ago - Stars: 17 - Forks: 6
zyocum/simphon
Proof-of-concept for measuring similarity of phoneme sequences using locality sensitive hashing (LSH).
Language: Jupyter Notebook - Size: 1.23 MB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 0 - Forks: 0
KeremZaman/semantic-sh
semantic-sh is a SimHash implementation to detect and group similar texts by taking power of word vectors and transformer-based language models (BERT).
Language: Python - Size: 40 KB - Last synced: over 1 year ago - Pushed: over 3 years ago - Stars: 23 - Forks: 3
jinshuai86/Spider
基于Java的多线程爬虫框架
Language: Java - Size: 335 KB - Last synced: over 1 year ago - Pushed: almost 2 years ago - Stars: 8 - Forks: 4
Derek-Wds/Code_Plagiarism_Detection
Code plagiarism system based on Simhash and Nicad.
Language: Python - Size: 40 MB - Last synced: over 1 year ago - Pushed: over 5 years ago - Stars: 6 - Forks: 4
haoyuhu/gosimhash
A simhasher for Chinese documents implemented by golang, simply translated from yanyiwu/gosimhash
Language: Go - Size: 3.97 MB - Last synced: over 1 year ago - Pushed: over 6 years ago - Stars: 19 - Forks: 6
innerNULL/osimhash
A deduplication lib built Over [SIMHASH](https://github.com/yanyiwu/simhash).
Language: C++ - Size: 33.2 KB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0
lifefloating/contentcore
爬虫内容处理服务(自用)
Language: Python - Size: 87.9 KB - Last synced: over 1 year ago - Pushed: almost 4 years ago - Stars: 2 - Forks: 0
Xenia101/Illegal-Copyright-Detection-System-WEB-
Illegal Copyright Detection System WEB
Language: Python - Size: 2.74 MB - Last synced: about 1 month ago - Pushed: about 4 years ago - Stars: 1 - Forks: 0
xblanc33/simhash-js Fork of vkandy/simhash-js
Simhash implementation in Javascript
Language: JavaScript - Size: 51.8 KB - Last synced: 13 days ago - Pushed: almost 7 years ago - Stars: 4 - Forks: 3
jiangnanboy/text-de-duplication
text de-duplication 文本去重
Size: 12.7 KB - Last synced: over 1 year ago - Pushed: almost 4 years ago - Stars: 4 - Forks: 2
long-gong/datasets-E2H
Datasets Euclidean to Hamming Conversion
Language: C++ - Size: 186 MB - Last synced: 11 months ago - Pushed: almost 4 years ago - Stars: 0 - Forks: 1
LuoZijun/rust-jieba
Rust jieba
Language: Rust - Size: 1.97 MB - Last synced: about 2 months ago - Pushed: over 5 years ago - Stars: 2 - Forks: 0
nemosharma6/event-coding
event coding using spark and stanford-core-nlp
Language: Scala - Size: 3.85 MB - Last synced: over 1 year ago - Pushed: about 5 years ago - Stars: 1 - Forks: 0
hengfeiyang/simhash
a Golang implementation of Simhash Algorithm
Language: Go - Size: 1.95 KB - Last synced: 11 months ago - Pushed: almost 6 years ago - Stars: 2 - Forks: 1
igrgurina/SimHash
College project (Analysis of massive data sets) - C# implementation of big data algorithms (2017/2018)
Language: C# - Size: 10.7 KB - Last synced: about 1 year ago - Pushed: almost 6 years ago - Stars: 1 - Forks: 0
qingniufly/scala-simhash
Simhash algorithm using Jcseg for word segment, jenkins-hash for hash. Written in Scala
Language: Scala - Size: 2.01 MB - Last synced: about 1 year ago - Pushed: over 7 years ago - Stars: 1 - Forks: 1