GitHub topics: sentence-embeddings
neuml/txtai
💡 All-in-one open-source AI framework for semantic search, LLM orchestration and language model workflows
Language: Python - Size: 53 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 11,121 - Forks: 702

rutujakokate430/Multi-AI-Agent-team-of-Researchers-Software-developers-and-QA
Crew of AI Agents working towards developing the reference software solution end-to-end autonomously
Size: 0 Bytes - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

david-xander/visual-analytics-tool-sentence-embeddings
A visual analytics tool and framework for exploring compositionality in sentence embeddings. Gain interactive insights into how embedding models, composition functions, and similarity metrics influence textual representations, focusing on error gap analysis for enhanced model interpretability.
Language: Python - Size: 13.7 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

rafay123321/embedding-hallucinations
This repo shows how foundational model hallucinates and how we can fix such hallucinations using fine-tuning them
Language: Python - Size: 476 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

SeanLee97/AnglE
Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard
Language: Python - Size: 889 KB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 547 - Forks: 38

jina-ai/vectordb
A Python vector database you just need - no more, no less.
Language: Python - Size: 1.22 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 619 - Forks: 47

MaartenGr/BERTopic
Leveraging BERT and c-TF-IDF to create easily interpretable topics.
Language: Python - Size: 23.7 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 6,834 - Forks: 829

gyunggyung/AGI-Papers
Papers and Book to look at when starting AGI 📚
Language: Python - Size: 35.7 MB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 281 - Forks: 45

BBC-Esq/KeyBERT_GUI
GUI for the great keybert repository.
Language: Python - Size: 73.2 KB - Last synced at: 1 day ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

shubham0204/Sentence-Embeddings-Android
Embeddings from sentence-transformers in Android! Supports all-MiniLM-L6-V2, bge-small-en, snowflake-arctic, model2vec models and more
Language: Kotlin - Size: 42 MB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 47 - Forks: 6

Separius/awesome-sentence-embedding 📦
A curated list of pretrained sentence and word embedding models
Language: Python - Size: 282 KB - Last synced at: 11 days ago - Pushed at: about 4 years ago - Stars: 2,258 - Forks: 262

robrua/easy-bert
A Dead Simple BERT API for Python and Java (https://github.com/google-research/bert)
Language: Java - Size: 44.9 KB - Last synced at: about 22 hours ago - Pushed at: over 2 years ago - Stars: 173 - Forks: 44

shibing624/text2vec
text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。
Language: Python - Size: 15.4 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 4,754 - Forks: 413

cui-shaobo/conditional-dichotomy-quantification
A lightweight toolkit for measuring how “opposite” two texts are when they share the same context.
Language: Python - Size: 6.73 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

Agrover112/awesome-semantic-search
A curated list of awesome resources related to Semantic Search🔎 and Semantic Similarity tasks.
Size: 371 KB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 354 - Forks: 29

fuzzy-memory/caffeine-print
A current affairs and politics news mailer
Language: Python - Size: 1.46 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

nikolamilosevic86/local-genAI-search
Local-GenAI-Search is a generative search engine based on Llama 3, langchain and qdrant that answers questions based on your local files
Language: Python - Size: 2.27 MB - Last synced at: 7 days ago - Pushed at: 10 months ago - Stars: 94 - Forks: 36

oborchers/Fast_Sentence_Embeddings
Compute Sentence Embeddings Fast!
Language: Jupyter Notebook - Size: 2.86 MB - Last synced at: 15 days ago - Pushed at: over 2 years ago - Stars: 623 - Forks: 84

shangan23/similar-sentences
Similar sentence Prediction with more accurate results with your dataset on top of pertained model. #BERT
Language: Python - Size: 86.9 KB - Last synced at: 20 days ago - Pushed at: about 5 years ago - Stars: 8 - Forks: 2

wangyuxinwhy/uniem
unified embedding model
Language: Python - Size: 12.7 MB - Last synced at: 14 days ago - Pushed at: almost 2 years ago - Stars: 863 - Forks: 70

FlagOpen/FlagEmbedding
Retrieval and Retrieval-augmented LLMs
Language: Python - Size: 49.5 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 9,763 - Forks: 715

Sakibalam03/resume-scanner
🔍 AI-powered resume scanner that ranks candidates by semantic similarity to job descriptions. Supports PDF/DOCX/images with OCR fallback and sentence transformer embeddings for intelligent matching beyond keywords.
Language: Python - Size: 388 KB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

SeanLee97/xmnlp
xmnlp:提供中文分词, 词性标注, 命名体识别,情感分析,文本纠错,文本转拼音,文本摘要,偏旁部首,句子表征及文本相似度计算等功能
Language: Python - Size: 114 MB - Last synced at: 26 days ago - Pushed at: over 2 years ago - Stars: 1,285 - Forks: 188

goru001/inltk
Natural Language Toolkit for Indic Languages aims to provide out of the box support for various NLP tasks that an application developer might need
Language: Python - Size: 812 KB - Last synced at: 29 days ago - Pushed at: over 1 year ago - Stars: 830 - Forks: 160

geeks-of-data/knowledge-gpt
Extract knowledge from all information sources using gpt and other language models. Index and make Q&A session with information sources.
Language: Python - Size: 3.36 MB - Last synced at: 1 day ago - Pushed at: about 2 years ago - Stars: 281 - Forks: 54

princeton-nlp/SimCSE
[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
Language: Python - Size: 40.4 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 3,558 - Forks: 526

Doragd/Awesome-Sentence-Embedding
A curated list of research papers in Sentence Reprsentation Learning and a sts leaderboard of sentence embeddings.
Size: 174 KB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 316 - Forks: 20

Muennighoff/sgpt
SGPT: GPT Sentence Embeddings for Semantic Search
Language: Jupyter Notebook - Size: 17.4 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 867 - Forks: 54

JohnSnowLabs/nlu
1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.
Language: Python - Size: 474 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 915 - Forks: 138

kamalkraj/e5-mistral-7b-instruct
Finetune mistral-7b-instruct for sentence embeddings
Language: Python - Size: 34.2 KB - Last synced at: 26 days ago - Pushed at: about 1 year ago - Stars: 80 - Forks: 18

dborrelli/chat-intents
Clustering sentence embeddings to extract message intent
Language: Jupyter Notebook - Size: 6.38 MB - Last synced at: 12 days ago - Pushed at: over 3 years ago - Stars: 174 - Forks: 24

MoleculeTransformers/smiles-featurizers
Extract Molecular SMILES embeddings from language models pre-trained with various objectives architectures.
Language: Python - Size: 39.1 KB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 18 - Forks: 1

DanRo3/tesis-multiagente
Sistema multiagente basado en IA para la extracción y visualización de información desde bases de datos vectoriales mediante lenguaje natural.
Language: Python - Size: 2.72 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

toninf/dense_retrieval
Word2vec, sentenceBert, BM25 and IVFFlat Index quality and speed comparison
Language: Jupyter Notebook - Size: 129 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

TharinduDR/Simple-Sentence-Similarity
Exploring the simple sentence similarity measurements using word embeddings
Language: Python - Size: 60.4 MB - Last synced at: 12 days ago - Pushed at: 10 months ago - Stars: 100 - Forks: 37

cpcdoy/rust-sbert
Rust port of sentence-transformers (https://github.com/UKPLab/sentence-transformers)
Language: Rust - Size: 165 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 114 - Forks: 12

MatthewCYM/GenSE
Official implementaion of EMNLP 2022 paper "Generate, Discriminate, and Contrast: A Semi-Supervised Sentence Representation Learning Framework"
Language: Python - Size: 975 KB - Last synced at: 18 days ago - Pushed at: over 2 years ago - Stars: 23 - Forks: 1

hppRC/simple-simcse-ja
Exploring Japanese SimCSE
Language: Python - Size: 1.29 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 68 - Forks: 4

worldbank/GISTEmbed
GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embeddings
Language: Python - Size: 1.3 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 42 - Forks: 3

ritesh-modi/embedding-hallucinations
This repo shows how foundational model hallucinates and how we can fix such hallucinations using fine-tuning them
Language: Python - Size: 474 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

LazarusNLP/indonesian-sentence-embeddings
Embedding Representation for Indonesian Sentences!
Language: Jupyter Notebook - Size: 1.56 MB - Last synced at: 2 months ago - Pushed at: 11 months ago - Stars: 17 - Forks: 2

amazon-science/text_generation_diffusion_llm_topic
Topic Embedding, Text Generation and Modeling using diffusion
Language: Python - Size: 154 KB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 12 - Forks: 3

SAP-samples/acl2022-self-contrastive-decorrelation
Source code for ACL 2022 paper "Self-contrastive Decorrelation for Sentence Embeddings".
Language: Python - Size: 278 KB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 25 - Forks: 7

thiswillbeyourgithub/AnnA_Anki_neuronal_Appendix
Using machine learning on your anki collection to enhance the scheduling via semantic clustering and semantic similarity
Language: Python - Size: 3.89 MB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 64 - Forks: 1

JohnGiorgi/DeCLUTR
The corresponding code from our paper "DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations". Do not hesitate to open an issue if you run into any trouble!
Language: Python - Size: 702 KB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 379 - Forks: 33

dayyass/muse-as-service
REST API for sentence tokenization and embedding using Multilingual Universal Sentence Encoder.
Language: Python - Size: 339 KB - Last synced at: 3 days ago - Pushed at: almost 4 years ago - Stars: 51 - Forks: 5

HITsz-TMG/KaLM-Embedding
Code for KaLM-Embedding models
Language: Python - Size: 319 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 74 - Forks: 6

YomnaWaleed/job-recommendation-system-ai
AI-Powered Job Recommendation System An intelligent job recommendation system that analyzes PDF resumes and suggests the best job opportunities using NLP, FAISS, and Sentence Transformers.
Language: Jupyter Notebook - Size: 88.7 MB - Last synced at: 24 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

YomnaWaleed/medical-chatbot-using-Llama2
A medical chatbot built with Meta's Llama2, LangChain, and FAISS to provide accurate, context-aware responses to medical queries. The system uses a Flask-based web interface for user interaction and leverages Hugging Face embeddings for efficient document retrieval. Ideal for exploring domain-specific AI applications in healthcare.
Language: Jupyter Notebook - Size: 19.4 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

tomlin7/AI-research-assistant
Semantic document search system with pgvector and PGAI
Language: Python - Size: 50.8 KB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 2

TianduoWang/DiffAug
[EMNLP 2022] Differentiable Data Augmentation for Contrastive Sentence Representation Learning. https://arxiv.org/abs/2210.16536
Language: Python - Size: 551 KB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 39 - Forks: 2

4AI/BeLLM
Code for BeLLM: Backward Dependency Enhanced Large Language Model for Sentence Embeddings (NAACL2024)
Language: Python - Size: 247 KB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 0

ahr9n/quranic-search-v2
Quranic Lexical/Semantic Search
Language: Jupyter Notebook - Size: 5.91 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 35 - Forks: 7

SkywardAI/kirin
Self-hosted and local-first application for inference and RAG on consumer grade hardware.
Language: Python - Size: 918 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 7 - Forks: 8

YJiangcm/PromCSE
[EMNLP 2022] Improved Universal Sentence Embeddings with Prompt-based Contrastive Learning and Energy-based Learning
Language: Python - Size: 737 KB - Last synced at: 22 days ago - Pushed at: over 1 year ago - Stars: 134 - Forks: 16

DeepK/hoDMD-experiments
EigenSent: Spectral sentence embeddings using higher-order Dynamic Mode Decomposition
Language: Python - Size: 47.8 MB - Last synced at: 4 months ago - Pushed at: almost 6 years ago - Stars: 13 - Forks: 4

BounharAbdelaziz/MorDern-Bert
Sentence Transformer model finetuned from ModernBERT-base for Moroccan Darija.
Language: Jupyter Notebook - Size: 46.9 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

jeongukjae/question-similarity
Find similar questions via contrastive learning
Language: Python - Size: 93.8 KB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

yuanzhoulvpi2017/Rust4SenVec
convert sentence to vector by nlp transformers model in Rust
Language: Jupyter Notebook - Size: 21.5 KB - Last synced at: 5 days ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 1

pranshurastogi29/Amazon_ml_challenge-solution
26th place solution from 3290 teams held on HackerEarth
Language: Jupyter Notebook - Size: 238 KB - Last synced at: 2 months ago - Pushed at: almost 4 years ago - Stars: 7 - Forks: 0

goldpulpy/pysentence-similarity
PySentence-Similarity is a tool designed to identify and find similarities between sentences and a base sentence, expressed as a percentage 📊.
Language: Python - Size: 60.5 KB - Last synced at: 15 days ago - Pushed at: 6 months ago - Stars: 3 - Forks: 0

Abhigyan126/FEEDBACK
A Flask-based web application that analyzes user comments using sentiment analysis, similarity detection, and AI-powered insights.
Language: Python - Size: 9.77 KB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

kampersanda/sif-embedding
Rust implementation of SIF and uSIF: Simple and fast sentence embedding
Language: Rust - Size: 1.22 MB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 19 - Forks: 0

hellojwilde/energetic-ai
EnergeticAI is TensorFlow.js, optimized for serverless environments, with fast cold-start, small module size, and pre-trained models.
Language: TypeScript - Size: 35.8 MB - Last synced at: about 5 hours ago - Pushed at: over 1 year ago - Stars: 36 - Forks: 0

goamegah/pytorch-stc
PyTorch implementation of Self-training approch for short text clustering
Language: Python - Size: 16.9 MB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 0

chaosgen/awesome-sentence-embedding
A curated list of pretrained sentence and word embedding models
Language: Python - Size: 213 KB - Last synced at: 3 days ago - Pushed at: 8 months ago - Stars: 4 - Forks: 0

paraglondhe098/movie-recommendation-llm-embeddings
Movie recommender system using LLM and Vector database
Language: Jupyter Notebook - Size: 189 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

ash-sha/Semantic-Textual-Similarity-NLP
Measuring similarity of a sentence
Language: Jupyter Notebook - Size: 4.14 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 3 - Forks: 0

DolbyUUU/Reinforcement-Calibration-SimCSE
Reinforcement Calibration SimCSE, combining contrastive learning, artificial potential fields, perceptual loss, and RLHF to achieve improved Semantic Textual Similarity (STS) embeddings. PyTorch-based implementations of PerceptualBERT and ForceBasedInfoNCE, along with fine-tuning capabilities via RLHF and evaluation using SentEval.
Language: Python - Size: 371 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

voidism/DiffCSE
Code for the NAACL 2022 long paper "DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings"
Language: Python - Size: 6.3 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 293 - Forks: 26

ojrlopez27/nl-service-composition
NLSC Unrestricted Natural Language-based Service Composition Middleware that uses Sentence Embeddings. Named-Entity Recognition and other NLP models.
Language: Java - Size: 450 MB - Last synced at: 3 months ago - Pushed at: about 5 years ago - Stars: 8 - Forks: 1

hppRC/simple-simcse
A simple implementation of SimCSE
Language: Python - Size: 157 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 76 - Forks: 10

Nikoletos-K/QA-with-SBERT-for-CORD19
⚕️🦠 Developed a document retrieval system to return titles of scientific papers containing the answer to a given user question based on the first version of the COVID-19 Open Research Dataset (CORD-19) ☣️🧬
Language: Jupyter Notebook - Size: 1.53 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 4 - Forks: 1

Lizhecheng02/UCSD-CSE256-PA3
CSE 256 LIGN 256 - Statistical Natural Lang Proc - Nakashole [FA24] PA3
Language: Jupyter Notebook - Size: 5.07 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

shasss447/QuestionAnwering-with-RAG
This project implements a Retrieval-Augmented Generation (RAG) pipeline for answering user queries by combining information retrieval with text generation.
Language: Jupyter Notebook - Size: 2.93 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

kaushalshetty/Structured-Self-Attention
A Structured Self-attentive Sentence Embedding
Language: Python - Size: 492 KB - Last synced at: 7 months ago - Pushed at: almost 6 years ago - Stars: 495 - Forks: 106

atinyshrimp/TripAdvisor-Recommendation-ML-NLP
Machine Learning and NLP models for improving text-based recommendations on TripAdvisor, using BM25, TF-IDF, embeddings, and a Hybrid approach.
Language: Jupyter Notebook - Size: 489 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

ksm26/Embedding-Models-From-Architecture-to-Implementation
Understand and build embedding models, focusing on word and sentence embeddings, dual encoder architectures. Learn to train embedding models using contrastive loss, implement them in semantic search and RAG systems.
Language: Jupyter Notebook - Size: 2 MB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 3 - Forks: 0

spiritokko/sentence_similarity
Language: Python - Size: 646 KB - Last synced at: 7 days ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

cui-shaobo/causal-strength
evaluating the causal strength between cause and effect
Language: Python - Size: 107 KB - Last synced at: 10 days ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

ai-lluminator/backend
The backend for the Ailluminator project, which sends updates when relevant paper are being published, based on a prompt from the user.
Language: Python - Size: 9.21 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

francescobaio/Sentence_Reordering
This project was undertaken as part of the Deep Learning course final exam. The primary objective of this project is to develop and implement a deep learning model for sentence reordering. Sentence reordering is a challenging Natural Language Processing (NLP) task that involves rearranging the words in an ordered sentence.
Language: Jupyter Notebook - Size: 71.3 KB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

sdadas/polish-sentence-evaluation
Evaluation of Sentence Representations in Polish
Language: Python - Size: 4.96 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 22 - Forks: 3

flipz357/S3BERT
Semantically Structured Sentence Embeddings
Language: Python - Size: 72.3 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 65 - Forks: 5

Synapxe-DNA/healthhub-content-optimization
Content Optimization code for Health Hub Articles
Language: Jupyter Notebook - Size: 114 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

louisbrulenaudet/tax-retrieval-benchmark
An implementation of the TaxRetrievalBenchmark task for the 🤗 Massive Text Embedding Benchmark (MTEB) framework.
Language: Jupyter Notebook - Size: 85 KB - Last synced at: 7 days ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 1

jeongukjae/smaller-labse
Applying "Load What You Need: Smaller Versions of Multilingual BERT" to LaBSE
Language: Python - Size: 9.47 MB - Last synced at: 2 months ago - Pushed at: almost 4 years ago - Stars: 18 - Forks: 0

rbitr/ferrite
Simple, lightweight transformers in Fortran
Language: Fortran - Size: 28.3 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 16 - Forks: 1

Galal-pic/Talented-recruitment-and-skills-analysis-system
The project's goal is to help job seekers understand the basic qualifications for specific jobs and evaluate the suitability of their skills for those positions. Additionally, the program aims to assist recruiters in enhancing their resume selection processes by analyzing and understanding job advertisements ....
Language: HTML - Size: 12.3 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 1

Babelscape/CroCoAlign
A Cross-Lingual, Context-Aware and Fully-Neural Sentence Alignment System for Long Texts.
Language: Python - Size: 90.4 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 6 - Forks: 0

wuji3/nlpdk
Natural Language Processing(NLP) Toolbox
Language: Python - Size: 324 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 3 - Forks: 1

ksm26/Understanding-and-Applying-Text-Embeddings
Dive into the world of text embeddings. This course will guide you through leveraging text embeddings to enhance various natural language processing (NLP) tasks.
Language: Jupyter Notebook - Size: 4.58 MB - Last synced at: 21 days ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 6

izhx/uni-rep
Code for embedding and retrieval research.
Size: 5.86 KB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 16 - Forks: 0

Salman-Khan-Mohammed/Q-A-System
The "Codebasics Q&A" project is an end-to-end Question and Answer (Q&A) system developed for Codebasics, an e-learning company specializing in data-related courses and bootcamps. The system is designed to assist students who typically ask questions via Discord or email by providing instant, automated responses.
Language: Jupyter Notebook - Size: 270 MB - Last synced at: 4 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

eren23/semantic-code-searcher
Basic example for searching code semantically in github profiles. In python
Language: Python - Size: 44 MB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

retkowsky/azure_visual_search_toolkit
Azure AI Visual Search toolkit
Language: Jupyter Notebook - Size: 169 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 15 - Forks: 3

arasgungore/job-posting-duplicate-detection
A project aiming to leverage text embeddings and Milvus, a high-performance vector search engine, to detect duplicate job postings.
Language: Python - Size: 289 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 0

KwokHing/AI-Planet-LLM-Bootcamp-Challenge
An LLM challenge to (i) fine-tune pre-trained HuggingFace transformer model to build a Code Generation language model, and (ii) build a retrieval-augmented generation (RAG) application using LangChain
Language: Jupyter Notebook - Size: 874 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 0

dongkyunk/Semantic-Sentence-Similarity
Semantic Sentence Similarity using Word2Vec, Fasttext embedding and Cosine Similarity, Word Mover Distance
Language: Python - Size: 10.7 KB - Last synced at: 11 months ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

dayyass/muse_tf2pt
Convert MUSE from TensorFlow to PyTorch and ONNX
Language: Jupyter Notebook - Size: 1.74 MB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 11 - Forks: 0
