Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: simhash

87owo/PYAS

Python Antivirus Software

Language: Python - Size: 933 MB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 116 - Forks: 16

MajaJuri/Analiza-velikih-skupova-podataka

Implementacija algoritama predstavljenih na predmetu Analiza velikih skupova podataka (AVSP)

Language: Java - Size: 1.03 MB - Last synced: 16 days ago - Pushed: 16 days ago - Stars: 0 - Forks: 0

serega/gaoya

Locality Sensitive Hashing

Language: Rust - Size: 236 KB - Last synced: 8 days ago - Pushed: 11 months ago - Stars: 49 - Forks: 4

zyocum/dedup

Find duplicate text files.

Language: Python - Size: 19.5 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 11 - Forks: 2

dynatrace-oss/hash4j

Dynatrace hash library for Java

Language: Java - Size: 40.4 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 73 - Forks: 9

Marcnuth/deduplication

Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.

Language: Python - Size: 22.5 KB - Last synced: 13 days ago - Pushed: 9 months ago - Stars: 16 - Forks: 6

hybridtheory/floc-simhash

A fast python implementation of the SimHash algorithm.

Language: Python - Size: 27.3 KB - Last synced: 26 days ago - Pushed: over 2 years ago - Stars: 27 - Forks: 7

mgunn001/tmvis Fork of oduwsdl/tmvis

A Research Project Thumbnail Visualization to summarize the webpage changes over time

Language: JavaScript - Size: 5.57 MB - Last synced: about 2 months ago - Pushed: over 5 years ago - Stars: 1 - Forks: 2

sean-public/python-hashes

Interesting (non-cryptographic) hashes implemented in pure Python.

Language: Python - Size: 29.3 KB - Last synced: about 1 month ago - Pushed: almost 3 years ago - Stars: 238 - Forks: 43

james-bowman/nlp

Selected Machine Learning algorithms for natural language processing and semantic analysis in Golang

Language: Go - Size: 396 KB - Last synced: about 1 month ago - Pushed: about 3 years ago - Stars: 431 - Forks: 45

tomfran/crawler

A web crawler written in Rust

Language: Rust - Size: 3.64 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0

bbalet/stopwords

Removes most frequent words (stop words) from a text content. Based on a Curated list of language statistics.

Language: Go - Size: 89.8 KB - Last synced: about 2 months ago - Pushed: 11 months ago - Stars: 134 - Forks: 25

php-lsys/simhash

simhash for php extension : 判断文本相似度

Language: C - Size: 19.5 KB - Last synced: about 2 months ago - Pushed: about 2 years ago - Stars: 2 - Forks: 2

NETkiddy/simhash_similarity

A text similarity by simhash

Language: Go - Size: 6.84 KB - Last synced: 3 months ago - Pushed: over 5 years ago - Stars: 21 - Forks: 9

preciz/similarity

A library for cosine similarity & simhash calculation

Language: Elixir - Size: 53.7 KB - Last synced: 18 days ago - Pushed: over 1 year ago - Stars: 15 - Forks: 2

XAH30/LSH-vs-Finesse

In this repository you can find an implementation of LSH (Local | Sensitive Hashing) and Finesse algorithms, designed to find similar data based on their hashes

Language: C++ - Size: 5.72 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 1 - Forks: 0

Xenia101/KeyStroke-Dynamics

⌨️ User Verification based on Keystroke Dynamics / Two-factor Authentication technology based on Key-Stroke

Language: Python - Size: 548 KB - Last synced: about 1 month ago - Pushed: about 2 years ago - Stars: 3 - Forks: 2

oduwsdl/off-topic-memento-toolkit

This system evaluates a collection of mementos (archived web pages) to determine which are off topic. The collection can be part of an Archive-It collection, a single TimeMap, or stored in a WARC file.

Language: Python - Size: 93.7 MB - Last synced: 5 days ago - Pushed: over 2 years ago - Stars: 8 - Forks: 4

fturati/floc-minhash-attacks

Implementation for the attacks of the paper "Locality-Sensitive Hashing Does Not Guarantee Privacy! Attacks on Google's FLoC and the MinHash Hierarchy System"

Language: Python - Size: 17 MB - Last synced: 5 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 0

dbrcina/AVSP-FER-2020-21

Lab solutions for Analysis of Massive Datasets ("Analiza velikih skupova podataka") course at FER 2020/21

Language: Java - Size: 1.32 MB - Last synced: 5 months ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

sing1ee/simhash-java

A simple implementation of simhash algorithm by java.

Language: Java - Size: 1.52 MB - Last synced: 7 months ago - Pushed: over 3 years ago - Stars: 153 - Forks: 84

nepiskopos/duplicate-questions-detection-lsh

Knowledge extraction through Data Analysis, including Locality Sensitive Hashing (LSH).

Language: Jupyter Notebook - Size: 423 KB - Last synced: 8 months ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0

nnnet/superminhash

SuperMinHash: A New Minwise Hashing Algorithm for Jaccard Similarity Estimation, Simhash and SimhashIndex

Language: Python - Size: 19.5 KB - Last synced: 8 months ago - Pushed: over 1 year ago - Stars: 19 - Forks: 7

rihenperry/csuci-mscs-thesis-dist-web-crawler

documents my master's level thesis work on building continous, topical web crawler based on mercator 1999

Language: TeX - Size: 27.4 MB - Last synced: about 2 months ago - Pushed: over 4 years ago - Stars: 1 - Forks: 0

holsee/spirit_fingers

Elixir SimHash NIFs written in Rust

Language: Elixir - Size: 3.36 MB - Last synced: 17 days ago - Pushed: over 1 year ago - Stars: 18 - Forks: 1

sskender/analysis-of-massive-datasets

Analysis of Massive Datasets FER labs

Language: Python - Size: 19 MB - Last synced: 10 months ago - Pushed: almost 2 years ago - Stars: 1 - Forks: 0

manmolecular/history-fp

:feet: Create a behavioral fingerprint based on your zsh command line history

Language: Python - Size: 6.84 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 1 - Forks: 0

fpopic/avsp

(Class) Big Data Analysis Course Assignments

Language: Java - Size: 28.2 MB - Last synced: 10 months ago - Pushed: about 7 years ago - Stars: 0 - Forks: 0

justinfargnoli/simhash

A barebones implementation of the simhash data sketching algorithm.

Language: Go - Size: 7.81 KB - Last synced: 11 months ago - Pushed: almost 3 years ago - Stars: 1 - Forks: 0

privacy-lsh/floc-minhash

Implementation for the attacks of the paper "Locality-Sensitive Hashing Does Not Guarantee Privacy! Attacks on Google's FLoC and the MinHash Hierarchy System".

Language: Python - Size: 16.9 MB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

shenwei356/simhash-eval

Language: Go - Size: 2.91 MB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 2 - Forks: 1

ALuShu/checksystem

基于simHash的Web作业查重系统

Language: JavaScript - Size: 4.42 MB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 8 - Forks: 0

mokeeqian/copydetector

基于springboot和Google开源simhash算法实现的作业查重/抄袭检测/文本相似度分析可视化系统,,集成jplag、MOSS、singleCloud工具套件进行多方位查重 Ref: https://github.com/ALuShu/checksystem

Language: JavaScript - Size: 71.6 MB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 11 - Forks: 2

Qyokizzzz/simhash

The extended version of simhash supports fingerprint extraction of documents and images.

Language: Python - Size: 551 KB - Last synced: 9 months ago - Pushed: almost 2 years ago - Stars: 1 - Forks: 0

armchairtheorist/simhash2

A rewrite of Bookmate's simhash gem, which is an implementation of Moses Charikar's simhashes in Ruby.

Language: Ruby - Size: 27.3 KB - Last synced: 3 days ago - Pushed: over 5 years ago - Stars: 12 - Forks: 3

vkandy/simhash-js

Simhash implementation in Javascript

Language: JavaScript - Size: 49.8 KB - Last synced: about 1 year ago - Pushed: almost 7 years ago - Stars: 37 - Forks: 15

liuaiting/Financial-News-Analysis

招商银行FinTech-复赛-财经新闻分析

Language: Python - Size: 86.9 MB - Last synced: about 1 year ago - Pushed: about 4 years ago - Stars: 17 - Forks: 6

zyocum/simphon

Proof-of-concept for measuring similarity of phoneme sequences using locality sensitive hashing (LSH).

Language: Jupyter Notebook - Size: 1.23 MB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 0 - Forks: 0

KeremZaman/semantic-sh

semantic-sh is a SimHash implementation to detect and group similar texts by taking power of word vectors and transformer-based language models (BERT).

Language: Python - Size: 40 KB - Last synced: over 1 year ago - Pushed: over 3 years ago - Stars: 23 - Forks: 3

jinshuai86/Spider

基于Java的多线程爬虫框架

Language: Java - Size: 335 KB - Last synced: over 1 year ago - Pushed: almost 2 years ago - Stars: 8 - Forks: 4

Derek-Wds/Code_Plagiarism_Detection

Code plagiarism system based on Simhash and Nicad.

Language: Python - Size: 40 MB - Last synced: over 1 year ago - Pushed: over 5 years ago - Stars: 6 - Forks: 4

haoyuhu/gosimhash

A simhasher for Chinese documents implemented by golang, simply translated from yanyiwu/gosimhash

Language: Go - Size: 3.97 MB - Last synced: over 1 year ago - Pushed: over 6 years ago - Stars: 19 - Forks: 6

innerNULL/osimhash

A deduplication lib built Over [SIMHASH](https://github.com/yanyiwu/simhash).

Language: C++ - Size: 33.2 KB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

lifefloating/contentcore

爬虫内容处理服务(自用)

Language: Python - Size: 87.9 KB - Last synced: over 1 year ago - Pushed: almost 4 years ago - Stars: 2 - Forks: 0

Xenia101/Illegal-Copyright-Detection-System-WEB-

Illegal Copyright Detection System WEB

Language: Python - Size: 2.74 MB - Last synced: about 1 month ago - Pushed: about 4 years ago - Stars: 1 - Forks: 0

xblanc33/simhash-js Fork of vkandy/simhash-js

Simhash implementation in Javascript

Language: JavaScript - Size: 51.8 KB - Last synced: 13 days ago - Pushed: almost 7 years ago - Stars: 4 - Forks: 3

jiangnanboy/text-de-duplication

text de-duplication 文本去重

Size: 12.7 KB - Last synced: over 1 year ago - Pushed: almost 4 years ago - Stars: 4 - Forks: 2

long-gong/datasets-E2H

Datasets Euclidean to Hamming Conversion

Language: C++ - Size: 186 MB - Last synced: 11 months ago - Pushed: almost 4 years ago - Stars: 0 - Forks: 1

LuoZijun/rust-jieba

Rust jieba

Language: Rust - Size: 1.97 MB - Last synced: about 2 months ago - Pushed: over 5 years ago - Stars: 2 - Forks: 0

nemosharma6/event-coding

event coding using spark and stanford-core-nlp

Language: Scala - Size: 3.85 MB - Last synced: over 1 year ago - Pushed: about 5 years ago - Stars: 1 - Forks: 0

hengfeiyang/simhash

a Golang implementation of Simhash Algorithm

Language: Go - Size: 1.95 KB - Last synced: 11 months ago - Pushed: almost 6 years ago - Stars: 2 - Forks: 1

igrgurina/SimHash

College project (Analysis of massive data sets) - C# implementation of big data algorithms (2017/2018)

Language: C# - Size: 10.7 KB - Last synced: about 1 year ago - Pushed: almost 6 years ago - Stars: 1 - Forks: 0

qingniufly/scala-simhash

Simhash algorithm using Jcseg for word segment, jenkins-hash for hash. Written in Scala

Language: Scala - Size: 2.01 MB - Last synced: about 1 year ago - Pushed: over 7 years ago - Stars: 1 - Forks: 1