GitHub topics: document-similarity
piskvorky/gensim
Topic Modelling for Humans
Language: Python - Size: 101 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 16,054 - Forks: 4,393

oborchers/Fast_Sentence_Embeddings
Compute Sentence Embeddings Fast!
Language: Jupyter Notebook - Size: 2.86 MB - Last synced at: 19 days ago - Pushed at: over 2 years ago - Stars: 623 - Forks: 84

izikeros/sentence-plagiarism
Compare sentences from input document with all sentences from reference documents - find very similar ones.
Language: Python - Size: 263 KB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

axyc777/Plagiarism-Checker
Plagiarism Checker is a Flask-based web application that allows users to upload .txt or .docx files and checks for plagiarism using advanced text comparison and optional Google Search (via SERP API). It uses Natural Language Processing (NLP), cosine similarity, and keyword extraction to intelligently compare documents or check them online.
Language: Python - Size: 17.6 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

allenai/aspire
Repo for Aspire - A scientific document similarity model based on matching fine-grained aspects of scientific papers.
Language: Python - Size: 268 KB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 52 - Forks: 5

hiropppe/text-models
Topic Modeling in Cython
Language: Jupyter Notebook - Size: 48.9 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

IlyaGusev/tgcontest
Telegram Data Clustering contest solution by Mindful Squirrel
Language: HTML - Size: 14.1 MB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 96 - Forks: 25

eggdropsoap/tilsh
tilsh implements the TLSH locality-sensitive hash algorithm suite
Language: JavaScript - Size: 25.4 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

zbmed-semtec/doc2vec-doc-relevance
An approach exploring and assessing literature-based doc-2-doc recommendations using a doc2vec and applying to the RELISH dataset.
Language: Python - Size: 9.55 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

zbmed-semtec/wmd-word2vec
An approach exploring and assessing literature-based doc-2-doc recommendations using word2vec and word mover's distance and applying it to RELISH dataset.
Language: Python - Size: 1.1 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Mohammed-3tef/Document_Similarity
A C++ program to measure the similarity between two text documents using efficient algorithms like cosine similarity, with support for preprocessing and customization.
Language: C++ - Size: 149 KB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

zayedrais/DocumentSearchEngine
Document Search Engine project with TF-IDF abd Google universal sentence encoder model
Language: Jupyter Notebook - Size: 28.6 MB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 53 - Forks: 24

zbmed-semtec/word2doc2vec-doc-relevance
An approach exploring and assessing literature-based doc-2-doc recommendations using word2vec combined with doc2vec, and applying it to TREC and RELISH datasets
Language: Python - Size: 13.2 MB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

shreyansh26/MinHash-Implemenation
A simple MinHash implementation based on the explanation in the Mining of Massive Datasets course by Stanford
Language: Python - Size: 7.4 MB - Last synced at: 25 days ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

zbmed-semtec/hybrid-pre-doc2vec-doc-relevance
Hybrid approach combining dictionary-based NER and doc2vec
Language: Jupyter Notebook - Size: 23.9 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Forthoney/doc_sim
Approximate document similarity with Minhash + Locality Sensitive Hashing
Language: Ruby - Size: 48.8 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 2

DrKenReid/Generalized-Analysis-of-Text-Data
A comprehensive toolkit for analyzing text data using various AI and NLP techniques, including topic modeling, sentiment analysis, and text classification, demonstrated on the 20 Newsgroups dataset.
Language: Jupyter Notebook - Size: 1.45 MB - Last synced at: 28 days ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

meenavyas/Misc
Contains interesting projects like Cat face detection, cat face recognition, code generation, Building chatbot, finding similar documents, image segmentation, UCI credit card, anomaly detection, MNIST etc.
Language: Jupyter Notebook - Size: 47.6 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 18 - Forks: 36

andrewmcloud/consimilo
A Clojure library for querying large data-sets on similarity
Language: Clojure - Size: 536 KB - Last synced at: 9 days ago - Pushed at: over 6 years ago - Stars: 63 - Forks: 4

developersaintt/Document-Similarity
was curious about how plagiarism checker works, ended up learning about something completely different 😂
Language: Python - Size: 7.81 KB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

NourKamaly/TheArtInOurWorlds-NASA-Space-Apps
NASA space apps 2022 local winner (Cairo). This project is the solution designed for the NASA space apps challenge hackathon 2022 by team NASART solving challenge: The Art in Our Worlds.
Language: Jupyter Notebook - Size: 96.5 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 4

Vincent96034/DocumentSimilarity
Q3 of Final Project Assignment of the course 'Foundations of Data Science' @ CBS
Language: Python - Size: 4.14 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Dipesh-Pokhrel/doc_similarity
Similarity between two documents.
Language: Python - Size: 3.91 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

parvez86/Smart-Recruitment-System
A simple Django-based resume ranker website where recruiters post their jobs and candidates applies for their desired vacancies. The system gets the document similarity between the job description and the candidate resumes, generates similarity scores using the KNN model, and rank or shortlist the candidate resumes.
Language: HTML - Size: 175 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

captv89/findSimilarDocs
A PoC on document comparison using various methods in NLP
Language: Jupyter Notebook - Size: 382 KB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

EslamElbassel/Indexing-and-Documents-Similarity
Measures the similarity between documents by calculating Jaccard similarity between documents and provide a similarity score based on how similar the sentences are compared to each other
Language: Java - Size: 7.81 KB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

nekcht/minhash-lsh-evaluation
Assessing MinHash LSH for text similarity. Compares with kNN using BART embeddings as ground truth. Involves data preprocessing, shingle creation, LSH experiments. Findings inform LSH's efficiency in document similarity tasks, enhancing understanding of LSH techniques.
Language: Jupyter Notebook - Size: 367 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

AustinZuniga/Auto-tagging-of-Theses-and-Dissertations-of-Bicol-University-Searching-and-Matching-
A system for automatic tagging of metadata of theses and dissertations from Bicol University
Language: Python - Size: 22.5 KB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 0

priyanka-ddit/NLP
This repository will demonstrate how to explore spiritual world using NLP techniques like, sentiment analysis, topic modeling, information retrieval and text summarization.
Language: Jupyter Notebook - Size: 3.05 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

abhilampard/Simple-Plagiarism-Checker
Web Application for checking the similarity between query and document using the concept of Cosine Similarity.
Language: Python - Size: 6.84 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 79 - Forks: 59

now-youre-gittin-it/nlp-workplace-comedies
NLP on American workplace comedy TV pilot transcripts using multiple NLP libraries in Python.
Language: Jupyter Notebook - Size: 17.2 MB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

malteos/wikipedia-article-recommendations
Survey data and Python code for the ICADL 2021 paper "A Qualitative Evaluation of User Preference for Link-based vs. Text-based Recommendations of Wikipedia Articles"
Language: Jupyter Notebook - Size: 746 KB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 5 - Forks: 0

Siddhantmest/Categorizing-amazon-products
Classifying products into categories using NLP techniques
Language: Jupyter Notebook - Size: 556 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

shmsi/document-ranking
Document ranking word embeddings
Language: Python - Size: 43 KB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

MSVCode/doc-similarity
Simple document similarity module implemented in NodeJS
Language: JavaScript - Size: 3.91 KB - Last synced at: 2 days ago - Pushed at: over 7 years ago - Stars: 2 - Forks: 2

ribbas/Highlite
Document comparison tool
Language: Python - Size: 8.66 MB - Last synced at: almost 2 years ago - Pushed at: almost 8 years ago - Stars: 0 - Forks: 1

shrebox/Natural-Language-Processing
Compilation of Natural Language Processing (NLP) codes. BONUS: Link to Information Retrieval (IR) codes compilation. (checkout the readme)
Language: Python - Size: 1.85 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 12 - Forks: 0

mdietrichstein/ir-search-engine-rust
Rust-based text search engine from scratch supporting multiple document similarity metrics (TF-IDF, BM25, BM25VA)
Language: Rust - Size: 132 KB - Last synced at: 2 days ago - Pushed at: about 4 years ago - Stars: 5 - Forks: 0

zuliani99/All-Pairs-Docs-Similarity
Given a set of documents and the minimum required similarity threshold find the number of document pairs that exceed the threshold
Language: Jupyter Notebook - Size: 14.1 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

nunososorio/docxmatch
DocxMatch is a Streamlit app that analyzes the similarity between Word files.
Language: Python - Size: 43.9 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 1

nicoDs96/Document-Similarity-using-Python-and-PySpark
Document Similarity with Apache Spark using Locality Sesitive Hashing and Python
Language: Jupyter Notebook - Size: 444 KB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 6 - Forks: 1

massanishi/document_similarity_algorithms_experiments
Document similarity algorithms experiment - Jaccard, TF-IDF, Doc2vec, USE, and BERT.
Language: Python - Size: 28.3 KB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 67 - Forks: 29

Sarthakjain1206/Intelligent_Document_Finder
Document Search Engine Tool
Language: Python - Size: 56.3 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 57 - Forks: 13

biovino1/BuffettLetters
NLP of Warren Buffett's annual letter to shareholders
Language: Jupyter Notebook - Size: 10.9 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

maxoodf/tgnews
Telegram Data Clustering Contest (Bossy Gnu's submission )
Language: C++ - Size: 41 KB - Last synced at: 3 months ago - Pushed at: over 4 years ago - Stars: 4 - Forks: 2

TSunny007/StackOverflowAnalytics
Data mining on stack overflow Q/A data to understand the landscape of languages and developers in computer science
Language: Jupyter Notebook - Size: 21.9 MB - Last synced at: over 2 years ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 1

Sarthakjain1206/Intelligent-Document-Finder
A tool which can find your any document using semantic search
Language: Python - Size: 43.5 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 1

PolunLin/doc_similiarty
Language: HTML - Size: 229 KB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

tejaspradhan/AI-based-Hiring-Platform
A Two-ended Hiring web application built using flask. The application uses document similarity techniques for recommendation.
Language: HTML - Size: 4.51 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 3

sabareeswarans11/SearchEngine_InvertedIndex
Information Retrieval: Document Similarity Measure Pre-processing to Build Document Vectors for Web Page Content Analysis.
Language: Jupyter Notebook - Size: 2.22 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

TSunny007/Document-Similarity
Using Jaccard-Similarity and Minhashing to determine similarity between two text documents
Language: Jupyter Notebook - Size: 26.4 KB - Last synced at: over 2 years ago - Pushed at: over 7 years ago - Stars: 6 - Forks: 3

omarabdelaz1z/Inverted-Index-and-Document-Similarity
Language: Python - Size: 113 KB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

jungsoh/wordvecs-word-analogy-by-document-similarity
Use of word embeddings and document similarity to solve word analogy problems
Language: Python - Size: 65.1 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

lukacupic/PDF-Document-Management-and-Search-System
Bachelor's Thesis at FER, University of Zagreb, 2018.
Language: Java - Size: 56 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 1

mohammaduzair9/Document-Searching
Document searching from queries using Inverted index
Language: Python - Size: 717 KB - Last synced at: over 2 years ago - Pushed at: about 7 years ago - Stars: 2 - Forks: 0

iboraham/job-finder
The framework that finds a perfect job match for you provided through scraped data from indeed.co.uk.
Language: Python - Size: 19.7 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

adriamoya/bcpnews
Classifying news articles with deep learning to build an automatic newsletter
Language: Jupyter Notebook - Size: 70 MB - Last synced at: over 2 years ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 0

1tangerine1day/search_engine
a search engine for Pubmed artitcal
Language: JavaScript - Size: 5.91 MB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

topcat/pubmed-docsim
Code to train a LSI model using Pubmed OA medical documents and to use pre-trained Pubmed models on your own corpus for document similarity.
Language: Python - Size: 4.88 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 2 - Forks: 1

tifaniwarnita/Document-Similarity
Document similarity using cosine distance, tf-idf, and latent semantic analysis.
Language: R - Size: 51.4 MB - Last synced at: over 2 years ago - Pushed at: over 8 years ago - Stars: 0 - Forks: 5

yadhu98/Document-Similarity-using-Python
This is a program used to check document similarity using Natural Language Tool Kit,using Cosine Similarity.
Language: Python - Size: 3.91 KB - Last synced at: about 1 year ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 1

Bit-Nation/notary
The Bitnation Jurisdiction Public Notary DApp
Language: JavaScript - Size: 139 KB - Last synced at: about 1 month ago - Pushed at: almost 7 years ago - Stars: 2 - Forks: 1

jgiere/DocGraph
Index documents in Apache Solr and see similarities in the document's contents.
Language: Java - Size: 243 KB - Last synced at: over 1 year ago - Pushed at: almost 8 years ago - Stars: 0 - Forks: 0

JeetThakare/NaturalLangProcessing
NLP Projects
Language: Python - Size: 514 KB - Last synced at: over 2 years ago - Pushed at: almost 8 years ago - Stars: 0 - Forks: 0

zeitunik/Big-Data
Big data homework solutions
Language: Python - Size: 146 KB - Last synced at: over 2 years ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 0
