An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: jaccard-similarity

ekzhu/datasketch

MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW

Language: Python - Size: 5.68 MB - Last synced at: 3 days ago - Pushed at: 12 months ago - Stars: 2,699 - Forks: 299

ashvardanian/jaccard-index

Optimizing bit-level Jaccard Index and Population Counts for large-scale quantized Vector Search via Harley-Seal CSA and Lookup Tables

Language: Jupyter Notebook - Size: 76.2 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 18 - Forks: 1

PiotrTymoszuk/FGFR-BLCA

Genetic alterations and expression of genes coding for FGF ligands and FGF reseptors in urothelial cancer

Language: R - Size: 163 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

adrg/strutil

Go metrics for calculating string similarity and other string utility functions

Language: Go - Size: 111 KB - Last synced at: 3 days ago - Pushed at: 19 days ago - Stars: 382 - Forks: 25

izikeros/sentence-plagiarism

Compare sentences from input document with all sentences from reference documents - find very similar ones.

Language: Python - Size: 244 KB - Last synced at: 5 days ago - Pushed at: 15 days ago - Stars: 3 - Forks: 0

matiskay/html-similarity

Compare html similarity using structural and style metrics

Language: Python - Size: 64.5 KB - Last synced at: 5 days ago - Pushed at: about 2 years ago - Stars: 211 - Forks: 23

Jingnan-Jia/segmentation_metrics

A package to compute medical segmentation metrics.

Language: Python - Size: 171 KB - Last synced at: 8 days ago - Pushed at: 10 months ago - Stars: 159 - Forks: 12

dennismgoetz/DataMining

"Data Mining" course at the University of Trento

Language: Jupyter Notebook - Size: 68.6 MB - Last synced at: 7 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

chrismattmann/tika-similarity

Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.

Language: Python - Size: 3.22 MB - Last synced at: 21 days ago - Pushed at: about 1 month ago - Stars: 107 - Forks: 60

RobCyberLab/Ngram-Similarity-Engine

🤖Ngram Similarity Engine📚

Language: Python - Size: 3.62 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

Dakshmulundkar/SocialVoyage

Social Voyagee is a travel matchmaking app that connects users based on shared destinations, group size, gender, and age. It features secure authentication, profile management, friend requests, and real-time matchmaking using Jaccard similarity. Built with Flask, MongoDB, and a modern UI, it makes travel social and fun! 🚀

Language: HTML - Size: 1.91 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 1

miltiadiss/CEID_NE4338-Multidimensional-Data-Structures

This project implements multi-dimensional indices (k-d trees, quad trees, range trees, R-trees) for querying computer scientists' data by surname, awards, and publications, with education similarity measured using LSH, comparing the methods experimentally.

Language: Python - Size: 3.29 MB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

adityapathakk/match-resume-with-jobDescription Fork of adityapathak-cubastion/match-resume-with-jobDescription

This project aims to make the process of matching resumes with a particular job description much faster. Simply enter the required job-description and all the resumes that need to be filtered and run the script to find the top scorer as well as the 'n' best matching resumes! Built using Python, Hugging Face and Scikit-Learn.

Language: Python - Size: 219 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

adityapathak-cubastion/match-resume-with-jobDescription

This project aims to make the process of matching resumes with a particular job description much faster. Simply enter the required job-description and all the resumes that need to be filtered and run the script to find the top scorer as well as the 'n' best matching resumes! Built using Python, Hugging Face and Scikit-Learn.

Language: Python - Size: 265 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 0 - Forks: 1

xSenzaki/Automated-Essay-Checker

A project requirement for the subject 'CS303 - Automata Theory'

Language: Python - Size: 0 Bytes - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

ashithapallath/KNN-Distance-Measures

This project compares k-NN performance using different distance metrics. Euclidean, Manhattan, and Minkowski achieved 100% accuracy, making them ideal for numerical data. Cosine Similarity performed well (93.33%), while Hamming and Jaccard were ineffective (33.33%).

Language: Jupyter Notebook - Size: 89.8 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

berksudan/Where-is-the-Answer

A Turkish NLP tool built as a computer project. Used: Python 3, Word2Vec, Natural Language Processing Techniques, Linux Bash Script.

Language: Python - Size: 183 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

mrkkrp/text-metrics

Calculate various string metrics efficiently in Haskell

Language: Haskell - Size: 122 KB - Last synced at: 22 days ago - Pushed at: 4 months ago - Stars: 44 - Forks: 4

iMD10/CS315-Texts-Similarity

This repository showcases a project developed for the CS315 Algorithms Design and Analysis course, focusing on finding the similarity between two texts using Jaccard Similarity.

Language: Python - Size: 3.7 MB - Last synced at: 30 days ago - Pushed at: 5 months ago - Stars: 4 - Forks: 0

italo-batista/lsh-semantic-similarity

Locality Sensitive Hashing for semantic similarity (Python 3.x)

Language: Python - Size: 9.77 KB - Last synced at: 17 days ago - Pushed at: almost 7 years ago - Stars: 15 - Forks: 2

Abdelrahman-Amen/Word-Embedding

This code showcases text preprocessing (tokenization, stopword removal, and standardization), training a Word2Vec model to generate word embeddings, and analyzing word relationships using metrics like cosine similarity and Jaccard index. It also visualizes high-dimensional embeddings in 2D using MDS, illustrating how similar words cluster together

Language: Jupyter Notebook - Size: 793 KB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

MrPowers/spark-stringmetric

Spark functions to run popular phonetic and string matching algorithms

Language: Scala - Size: 457 KB - Last synced at: about 2 months ago - Pushed at: about 3 years ago - Stars: 60 - Forks: 6

lgautier/mashing-pumpkins

Minhash and maxhash library in Python, combining flexibility, expressivity, and performance.

Language: C - Size: 1.4 MB - Last synced at: 1 day ago - Pushed at: 5 months ago - Stars: 21 - Forks: 3

sumn2u/string-comparisons

A collection of string comparisons algorithms

Language: JavaScript - Size: 700 KB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 14 - Forks: 5

atkamara/Taxability

Descriptive, predictive analysis of taxability

Language: Jupyter Notebook - Size: 48.2 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

SonakshiA/Similarity-Score-Techniques

The repository shows 6 techniques to measure similarity to determine how similar two pieces of text are. Similarity Measure plays an important role in document/information retrieval, machine translation, question-answering, and document matching.

Language: Jupyter Notebook - Size: 4.88 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

ikajdan/article_similarity_analysis

Analysis of the similarity between articles based on their content using TF-IDF and LDA

Language: Python - Size: 8.11 MB - Last synced at: 1 day ago - Pushed at: 7 months ago - Stars: 0 - Forks: 1

wajahati/ZAROORAT-ReactNative-Firebase-App

ZAROORAT is a react native app where users can buy and sell stuff.

Language: JavaScript - Size: 1.8 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 2

IgorSAlencar/SimilaridadeJaccardCosseno

Código desenvolvido para o Trabalho de Conclusão de Curso (TCC) da Licenciatura em Matemática no IFSP - Campus Itaquaquecetuba, como parte dos requisitos para a obtenção do grau. O projeto aplica técnicas de Similaridade de Cosseno e Jaccard para análise de feedbacks de clientes.

Language: Jupyter Notebook - Size: 213 KB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

vickumar1981/stringdistance

A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..

Language: Scala - Size: 1.27 MB - Last synced at: 10 days ago - Pushed at: about 3 years ago - Stars: 78 - Forks: 14

Animesh-Chourey/Loan-Classifier

Trained machine learning algorithms (Logistic Regression, KNN, SVM, Decision Tree) specifically, after performing visualization and pre-preocessing tasks on a loan dataset. Executed the evaluation metrics such as F1-score, Log loss and jaccard-similarity score to assess the algorithms performance.

Language: Jupyter Notebook - Size: 29.3 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

SasheVuchkov/near-duplicate-docs

Simple library for finding duplicate and near-duplicate text documents in massive sets/libraries/databases

Language: TypeScript - Size: 2 MB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 0

oertl/treeminhash

TreeMinHash: Fast Sketching for Weighted Jaccard Similarity Estimation

Language: C++ - Size: 2.62 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 14 - Forks: 3

oertl/probminhash

ProbMinHash – A Class of Locality-Sensitive Hash Algorithms for the (Probability) Jaccard Similarity

Language: C++ - Size: 6.26 MB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 42 - Forks: 6

FaridYusifli/AMDM_hw4

Homework 4 of Algorithmic Methods for Data Mining. We dealing with networks and graph with about 1 000 000 nodes

Language: Jupyter Notebook - Size: 1.54 MB - Last synced at: 10 months ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

dartseoengineer/keyword-clustering

This repository provides a Python script to cluster keywords based on the similarity of their associated URLs, calculated using the Jaccard similarity coefficient.

Language: Python - Size: 14.6 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

kumaranjalij/Flora-Genie

Flora Genie is a personalized plant recommendation system designed to help amateur gardeners select the most suitable plants for their homes or gardens.

Language: Jupyter Notebook - Size: 227 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

andrewmcloud/consimilo

A Clojure library for querying large data-sets on similarity

Language: Clojure - Size: 536 KB - Last synced at: 9 days ago - Pushed at: over 6 years ago - Stars: 63 - Forks: 4

ppw0/minhash

find similar text files quickly

Language: Python - Size: 53.7 KB - Last synced at: 6 months ago - Pushed at: about 4 years ago - Stars: 6 - Forks: 1

adriacabeza/Document-similarity-detection-using-hashing

:page_with_curl:Document similarity detection using hashing

Language: TeX - Size: 16 MB - Last synced at: 3 months ago - Pushed at: over 6 years ago - Stars: 3 - Forks: 1

vokter/vokter-scheduler

(WIP)

Size: 0 Bytes - Last synced at: about 1 year ago - Pushed at: over 8 years ago - Stars: 0 - Forks: 0

vokter/vokter-client-java

Sample Jetty/Jersey2 server that interoperates with a running Vokter server (https://github.com/vokter/vokter).

Language: Java - Size: 7.81 KB - Last synced at: about 1 year ago - Pushed at: almost 9 years ago - Stars: 0 - Forks: 0

vokter/vokter-server

(WIP) HTTP server that deploy distributes Vokter (https://github.com/vokter/vokter) through a REST API.

Size: 3.91 KB - Last synced at: about 1 year ago - Pushed at: over 8 years ago - Stars: 0 - Forks: 0

kavya76/Search-Engine

A simple search engine for Environmental News NLP archive

Language: Jupyter Notebook - Size: 1.43 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 1

john-fotis/Movie-Recommender

A movie recommender written in Go that suggests movies considering various factors within a particular dataset, encompassing users, movies, and movie ratings.

Language: Go - Size: 1.45 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Lefteris-Souflas/Movie-Rating-User-Similarity

Explored Jaccard distance, Min-Hashing, and LSH for user similarity in a movie rating dataset. Tasks involve dataset preprocessing, exact Jaccard Similarity computation, Min-Hash signatures, and LSH implementation. Results and observations are documented in code, output files, and a report

Language: Jupyter Notebook - Size: 1.22 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Lefteris-Souflas/Entity-Resolution

Addressed Entity Resolution challenges. Tasks include schema-agnostic blocking, pairwise comparisons, Meta-Blocking graph construction, and Jaccard similarity computation. Deliverables include source code, reports, and reproducibility guidelines in Python

Language: Jupyter Notebook - Size: 4.54 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

sdevalapurkar/similar-questions

👯 Algorithms using Jaccard similarity to identify questions from a list that are similar to one another

Language: Python - Size: 13.6 MB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

dynatrace-research/set-sketch-paper

SetSketch: Filling the Gap between MinHash and HyperLogLog

Language: C++ - Size: 23.7 MB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 46 - Forks: 5

vkbandari/job_recommendation_engine

recommendation of jobs by various machine learning models

Language: Jupyter Notebook - Size: 8.42 MB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

emarkou/Text-Similarity

A text similarity computation using minhashing and Jaccard distance on reuters dataset

Language: R - Size: 69.3 KB - Last synced at: about 1 year ago - Pushed at: almost 7 years ago - Stars: 16 - Forks: 5

mtshikomba/jaccard_text_summarizer

Using the Jaccard ranking algorithm to summarize a document

Language: Jupyter Notebook - Size: 25.4 KB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

AdrianaMacc/Covid-19-BigData-Project

SARS-COV-2 genome analysis using Big Data algorithms in order to find clusters of similar mutations that belongs to different clades which mutate together and generate the correspondent clade.

Language: Jupyter Notebook - Size: 513 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

ngiambla/syn_sugar

Extracting topics using rules.

Language: PureBasic - Size: 888 MB - Last synced at: about 1 year ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 0

imenbkr/Fraud-Detection-Project

An application for fraud detection in medicine packages and tablets.

Language: Python - Size: 24 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

youssefelmougy/jaccard-selector

Asynchronous Distributed Actor-based Approach to Jaccard Similarity for Genome Comparisons

Language: Fortran - Size: 112 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

christinebuckler/provider-prescriber

Language: Jupyter Notebook - Size: 30.5 MB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 1

oertl/bagminhash

BagMinHash - Minwise Hashing Algorithm for Weighted Sets

Language: C++ - Size: 1.02 MB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 26 - Forks: 6

MovieTone/JaccardDocumentComparison

Document Comparison web application based on Jaccard Similarity Index. The uploaded file is compared to all previously uploaded ones. Built with Java/JSP

Language: CSS - Size: 16.6 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

leocvml/DeepTool

Language: Python - Size: 156 KB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 1

cankobanz/multithreaded-scientific-search-engine

This is a school project from Operating Systems course where threads, mutexes, semaphores, task pools and critical sections are used effectively to ensure synchronization among threads.

Language: C++ - Size: 42 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

harryyizihan/predict_champions

League of Legends Champion Recommender System

Language: Jupyter Notebook - Size: 27.1 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

ManishaLagisetty/Travel-Recommendation-System

Machine Learning, Python

Language: Jupyter Notebook - Size: 10.3 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

khaosdoctor/sound-recommender

Simple API to recommend songs

Language: TypeScript - Size: 59.6 KB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 4 - Forks: 0

EslamElbassel/Indexing-and-Documents-Similarity

Measures the similarity between documents by calculating Jaccard similarity between documents and provide a similarity score based on how similar the sentences are compared to each other

Language: Java - Size: 7.81 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

ratthapon/simple-shape-classification

A simple shape recognition using Jaccard similarity, implemented on MATLAB.

Language: Matlab - Size: 1000 Bytes - Last synced at: over 1 year ago - Pushed at: about 9 years ago - Stars: 1 - Forks: 0

BeardedMorganKeller/MovieRecommendationEngine

IMDB Movie Recommendation Engine. Uses jaccard similarity of genres, and title similarity

Language: Python - Size: 687 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

iamtusharbhatia/Machine-Learning

This repository contains various assignments that I have done as a part of the Machine Learning course.

Language: R - Size: 3.62 MB - Last synced at: over 1 year ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 0

dominic-sagers/MovieLens-20M-Recommender-System

Using the MovieLens 20 Million review dataset, this project aims to explore different ways to design, evaluate, and explain recommender systems algorithms. Different item-based and user-based recommender systems are showcased as well as a hybrid algorithm using a modified page-rank algorithm.

Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

jayvatti/spellChecker

Spell Checker using a Hash Table

Language: C++ - Size: 109 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

EdDuarte/similarity-search-java

Easy-to-use Java similarity algorithms for text and numeric-series

Language: Java - Size: 149 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 18 - Forks: 10

usc-isi-i2/ppjoin

PPJoin and P4Join Python 3 implementation

Language: Python - Size: 172 KB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 5 - Forks: 0

iamr2k/JaccardSimilarity

Flask app to find similar movies using Jaccard similarity

Language: CSS - Size: 13.2 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

ada-k/TweetsClassification

Exploring Jaccard and Cosine similarities performances then visualising their output using k means and kmeans with pca. Additional input on time series analysis, web scrapping and twitter scrapping.

Language: Jupyter Notebook - Size: 525 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 14 - Forks: 9

ddellagiacoma/datamining-2016-project

Four different ways to predict reviews' rating through text analysis

Language: Java - Size: 5.65 MB - Last synced at: over 1 year ago - Pushed at: over 8 years ago - Stars: 0 - Forks: 0

nepiskopos/duplicate-questions-detection-lsh

Knowledge extraction through Data Analysis, including Locality Sensitive Hashing (LSH).

Language: Jupyter Notebook - Size: 423 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

samuel-bohman/jaccard-index

Function for calculating the Jaccard index and Jaccard distance for binary attributes

Language: R - Size: 2.93 KB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

Raghuls-github/Best-Classifier

Set of codes and algorithms to find various regression and further the Jaccard score, F1 score, and logloass.

Language: Jupyter Notebook - Size: 120 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

hellojudger/AntirattanLite

A simple program to solve the similarity between the solution and the code function by function.

Language: Python - Size: 31.3 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

NikosMav/DataAnalysis-Netflix

A notebook for movie and TV show recommendations using Boolean and TF-IDF methods. Get personalized suggestions based on text descriptions and choose the method that suits your preferences.

Language: Jupyter Notebook - Size: 582 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

Sitaras/Data-Mining

Project 1: 🎬🍿 Movie-Recommendation-System, Project 2: 📰🔍Fake News Detection System

Language: Jupyter Notebook - Size: 9.3 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 6 - Forks: 0

MagallanesFito/weheart

Meet people just like you

Language: Python - Size: 34.1 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

micts/jss

Fast Jaccard similarity search for abstract sets (documents, products, users, etc.) using MinHashing and Locality Sensitve Hashing

Language: Python - Size: 23.4 KB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 3 - Forks: 0

thejchap/catch

Matches gym partners based on schedule, location, and interests using augmented interval trees and Jaccard indices

Language: Ruby - Size: 483 KB - Last synced at: almost 2 years ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

92amartins/simple-recommender

A simple content-based recommender system

Language: R - Size: 1000 Bytes - Last synced at: almost 2 years ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 0

fagnercarvalho/QuestionSimilarityTest

Testing Jaccard similarity and Cosine similarity techniques to calculate the similarity between two questions.

Language: C# - Size: 6.84 KB - Last synced at: almost 2 years ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 0

DorinK/AI-Recommendation-Systems

Third Assignment in 'Artificial Intelligence' course by Dr. Ram Meshulam at Bar-Ilan University

Language: Python - Size: 2.91 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

holopoj/FHCP

Implementation of the paper "Finding Highly Correlated Pairs with Powerful Pruning" in Java.

Language: Java - Size: 1.56 MB - Last synced at: almost 2 years ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 1

mariofv/DocSim

Minhash text analyzer developed during Algorithmics subject.

Language: C++ - Size: 43.1 MB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 1

Abdelrahman-Hussain/Jaccard_similarity

this is a simple application to calculate the Jaccard similarity between input query and stored docs.

Language: Java - Size: 3.91 KB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

srsviegas/ufrgs-ed-jaccard

Programa que calcula o coeficiente de Jaccard entre dois arquivos de texto | Disciplina de Estrutura de Dados da UFRGS

Language: C - Size: 536 KB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

ulf1/simiscore-syntax

An ML API to compute the Jaccard similarity based on shingled subtrees of the dependency grammar.

Language: Python - Size: 64.5 KB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

akelsch/spotify-recommender

Recommender system based on the Spotify Million Playlist Dataset

Language: Java - Size: 1.31 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 1

Pooja-Bhojwani/linked-eed

Aim is to come up with a job recommender system, which takes the skills from LinkedIn and jobs from Indeed and throws the best jobs available for you according to your skills.

Language: Python - Size: 443 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 29 - Forks: 17

chanddu/Sentence-similarity-based-on-Semantic-nets-and-Corpus-Statistics-

This is an implementation of the paper written by Yuhua Li, David McLean, Zuhair A. Bandar, James D. O’Shea, and Keeley Crockett

Language: Python - Size: 2.93 KB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 21 - Forks: 9

abdo-essam/Inverted-Index

Implements an inverted index to support text search. The inverted index is built from a set of documents, where each document is represented by a unique integer ID.

Language: Java - Size: 872 KB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

salimtirit/multithreaded-search-engine

Multithreaded scientific search engine in C++ that uses Jaccard Similarity to summarize relevant paper abstracts.

Language: C++ - Size: 61.5 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

anshul1004/TweetsClustering

Clustering similar tweets using K-means clustering algorithm and Jaccard distance metric

Language: Python - Size: 3.32 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 7 - Forks: 4

am-tropin/restaurant-europe

🇪🇺🍽 The project classifies restaurants by various features using XGBoost and scikit-learn models and gives content-based recommendations of European restaurants using Jaccard metric from SciPy.

Language: Jupyter Notebook - Size: 36.6 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

elifmeseci/link-prediction-on-complex-networks

Using neighborhood-based link prediction methods to predict new links that will occur in networks created from darts championship competitions

Language: Jupyter Notebook - Size: 5.08 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 1

Related Keywords
jaccard-similarity 149 cosine-similarity 34 python 28 minhash 19 lsh 14 jaccard-distance 13 recommender-system 12 machine-learning 11 jaccard 10 locality-sensitive-hashing 10 similarity 10 jaccard-index 10 minhash-lsh-algorithm 10 data-mining 9 hamming-distance 9 nlp 9 java 8 clustering 7 python3 7 spark 6 jaccard-similarity-estimation 6 information-retrieval 6 text-diff 5 cpp 5 big-data 5 plagiarism-detection 5 tf-idf 5 content-based-recommendation 5 pearson-correlation 5 logistic-regression 5 natural-language-processing 5 dice-coefficient 5 similarity-score 4 differences-detected 4 text-similarity 4 bloom-filter 4 scikit-learn 4 document-similarity 4 similarity-search 4 notifications 4 diffmatchpatch 4 cosine-distance 4 levenshtein-distance 4 collaborative-filtering 4 jaro-winkler-distance 4 quartz 4 work-in-progress 3 numpy 3 lsh-algorithm 3 minwise-hashing 3 minwise-hashing-algorithm 3 hyperloglog 3 minhash-sketches 3 nltk 3 javascript 3 similarity-measures 3 networkx 3 jaro-distance 3 classification 3 word2vec 3 kmeans 3 jupyter-notebook 3 sentiment-analysis 3 kmeans-clustering 3 knn 3 golang 3 euclidean-distances 3 jaro-winkler 3 random-forest 3 tfidf 3 algorithm 3 data-science 3 f1-score 3 matplotlib 3 string-metrics 3 string-similarity 3 weighted-sets 2 gcov 2 jaccard-coefficient 2 pandas 2 logloss 2 longest-common-subsequence 2 fuzzy-matching 2 xgboost 2 ml 2 docker 2 decision-tree 2 trigrams 2 sorensen-dice-distance 2 knn-classification 2 svm-classifier 2 mutex 2 thread 2 threading 2 alternating-least-squares 2 tweepy 2 pca 2 beautifulsoup 2 recommender 2 deduplication 2