An open API service providing repository metadata for many open source software ecosystems.

Topic: "embedding-models"

Separius/awesome-sentence-embedding 📦

A curated list of pretrained sentence and word embedding models

Language: Python - Size: 282 KB - Last synced at: 10 days ago - Pushed at: almost 4 years ago - Stars: 2,258 - Forks: 262

Hironsan/awesome-embedding-models

A curated list of awesome embedding models tutorials, projects and communities.

Language: Jupyter Notebook - Size: 47.9 KB - Last synced at: 9 days ago - Pushed at: about 6 years ago - Stars: 1,783 - Forks: 250

Sujit-O/pykg2vec

Python library for knowledge graph embedding and representation learning.

Language: Python - Size: 9.29 MB - Last synced at: 12 days ago - Pushed at: about 4 years ago - Stars: 611 - Forks: 112

ContextualAI/gritlm

Generative Representational Instruction Tuning

Language: Jupyter Notebook - Size: 11.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 608 - Forks: 42

StarlightSearch/EmbedAnything

Production-ready Inference, Ingestion and Indexing built in Rust 🦀

Language: Rust - Size: 36.7 MB - Last synced at: 7 days ago - Pushed at: 11 days ago - Stars: 506 - Forks: 45

marl/openl3

OpenL3: Open-source deep audio and image embeddings

Language: Jupyter Notebook - Size: 687 MB - Last synced at: 13 days ago - Pushed at: almost 2 years ago - Stars: 503 - Forks: 57

BBC-Esq/VectorDB-Plugin

Plugin that lets you ask questions about your documents including audio and video files.

Language: Python - Size: 32.8 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 332 - Forks: 40

mana-ysh/knowledge-graph-embeddings 📦

Implementations of Embedding-based methods for Knowledge Base Completion tasks

Language: Python - Size: 10.2 MB - Last synced at: 4 days ago - Pushed at: over 4 years ago - Stars: 257 - Forks: 63

CVxTz/image_search_engine

Image search engine

Language: Python - Size: 1.75 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 226 - Forks: 41

lgalke/vec4ir

Word Embeddings for Information Retrieval

Language: Python - Size: 965 KB - Last synced at: 10 days ago - Pushed at: over 1 year ago - Stars: 225 - Forks: 42

spcl/ncc

Neural Code Comprehension: A Learnable Representation of Code Semantics

Language: Python - Size: 9.16 MB - Last synced at: 9 days ago - Pushed at: 5 months ago - Stars: 211 - Forks: 51

akutuzov/webvectors

Web-ify your word2vec: framework to serve distributional semantic models online

Language: Python - Size: 4.85 MB - Last synced at: 2 days ago - Pushed at: 2 months ago - Stars: 200 - Forks: 47

yusufhilmi/client-vector-search

A client side vector search library that can embed, store, search, and cache vectors. Works on the browser and node. It outperforms OpenAI's text-embedding-ada-002 and is way faster than Pinecone and other VectorDBs.

Language: TypeScript - Size: 314 KB - Last synced at: 5 days ago - Pushed at: 11 months ago - Stars: 197 - Forks: 14

jgraving/selfsne

Self-Supervised Noise Embeddings (Self-SNE)

Language: Jupyter Notebook - Size: 2.79 MB - Last synced at: 4 days ago - Pushed at: 18 days ago - Stars: 158 - Forks: 13

formath/tensorflow-predictor-cpp

tensorflow prediction using c++ api

Language: Python - Size: 94.7 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 146 - Forks: 60

mangopy/benchmarking-tool-retrieval

Official code for "🔍 Retrieval Models Aren’t Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models"

Language: JavaScript - Size: 3.29 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 140 - Forks: 2

ALucek/QuicKB

Optimize Document Retrieval with Fine-Tuned KnowledgeBases

Language: Python - Size: 1.63 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 107 - Forks: 21

p768lwy3/torecsys

ToR[e]cSys is a PyTorch Framework to implement recommendation system algorithms, including but not limited to click-through-rate (CTR) prediction, learning-to-ranking (LTR), and Matrix/Tensor Embedding. The project objective is to develop an ecosystem to experiment, share, reproduce, and deploy in real-world in a smooth and easy way.

Language: Python - Size: 6.42 MB - Last synced at: about 13 hours ago - Pushed at: about 3 years ago - Stars: 103 - Forks: 17

shobrook/weightgain

Train an adapter for any embedding model in under a minute

Language: Python - Size: 544 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 98 - Forks: 2

shamspias/langchain-chat

langchain-chat is an AI-driven Q&A system that leverages OpenAI's GPT-4 model and FAISS for efficient document indexing. It loads and splits documents from websites or PDFs, remembers conversations, and provides accurate, context-aware answers based on the indexed data. Easy to set up and extend.

Language: Python - Size: 1.34 MB - Last synced at: 16 days ago - Pushed at: over 1 year ago - Stars: 86 - Forks: 17

ikergarcia1996/MetaVec

A monolingual and cross-lingual meta-embedding generation and evaluation framework

Language: Python - Size: 69.3 KB - Last synced at: 20 days ago - Pushed at: almost 3 years ago - Stars: 80 - Forks: 5

kaushalshetty/Positional-Encoding

Encoding position with the word embeddings.

Language: Jupyter Notebook - Size: 154 KB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 79 - Forks: 13

D2KLab/entity2vec

Generates a set of property-specific entity embeddings from knowledge graphs using node2vec

Language: Python - Size: 23.7 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 77 - Forks: 24

HITsz-TMG/KaLM-Embedding

Code for KaLM-Embedding models

Language: Python - Size: 319 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 74 - Forks: 6

ART-Group-it/KERMIT

🐸 KERMIT - A lightweight library to encode and interpret Universal Syntactic Embeddings

Language: JavaScript - Size: 14.6 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 56 - Forks: 8

datquocnguyen/STransE

STransE: a novel embedding model of entities and relationships in knowledge bases (NAACL 2016)

Language: C++ - Size: 36.4 MB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 51 - Forks: 16

RoyZhengGao/edge2vec

Learning node representation using edge semantics

Language: Python - Size: 9.65 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 48 - Forks: 21

user1342/Tweezer

A binary analysis tool for identifying unknown function names, using a word-2-vec model

Language: Python - Size: 11.5 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 47 - Forks: 5

Glaciohound/VCML

PyTorch implementation of paper "Visual Concept-Metaconcept Learner", NeruIPS 2019

Language: Python - Size: 2.43 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 47 - Forks: 7

alisonbma/aiSFX

Representation Learning for the Automatic Indexing of Sound Effects Libraries (ISMIR 2022): Deep audio embeddings pre-trained on UCS & Non-UCS-compliant datasets.

Language: Python - Size: 59.6 KB - Last synced at: 11 days ago - Pushed at: almost 2 years ago - Stars: 43 - Forks: 4

worldbank/GISTEmbed

GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embeddings

Language: Python - Size: 1.3 MB - Last synced at: 19 days ago - Pushed at: about 1 year ago - Stars: 39 - Forks: 2

nsrinidhibhat/gradio_RAG

Code and resources showcasing the Retrieval-Augmented Generation (RAG) technique, a solution for enhancing data freshness in Large Language Models (LLMs). Incorporate up-to-date external knowledge into LLM-generated responses. Additionally, this repository includes a Gradio-based user interface for seamless model deployment.

Language: Python - Size: 779 KB - Last synced at: 9 months ago - Pushed at: over 1 year ago - Stars: 35 - Forks: 13

thustorage/PetPS

PetPS: Supporting Huge Embedding Models with Tiered Memory

Language: C++ - Size: 32.2 MB - Last synced at: 4 months ago - Pushed at: 11 months ago - Stars: 30 - Forks: 2

BoYanSTKO/place2vec

Place2Vec ground truth dataset

Size: 16.6 KB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 25 - Forks: 7

maxscheurer/cppe

C++ and Python library for Polarizable Embedding

Language: C++ - Size: 4.56 MB - Last synced at: 25 days ago - Pushed at: 8 months ago - Stars: 22 - Forks: 5

databricks-industry-solutions/product-search

Semantic product search on Databricks

Language: Python - Size: 445 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 19 - Forks: 10

UWNETLAB/dcss_supplementary

Supplementary materials for McLevey 2021 Doing Computational Social Science (Sage, UK).

Language: HTML - Size: 436 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 18 - Forks: 9

FabianGroeger96/deep-embedded-music

Creation of an embedding space using unsupervised triplet loss and Tile2Vec that can be used for a variety of downstream tasks

Language: Jupyter Notebook - Size: 26.4 MB - Last synced at: 21 days ago - Pushed at: over 3 years ago - Stars: 18 - Forks: 2

su-park/mteb_ko_leaderboard

한글 텍스트 임베딩 모델 리더보드

Size: 2.51 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 17 - Forks: 1

Wang-Yu-Qing/UTPM

Code for paper: Learning to Build User-tag Profile in Recommendation System

Language: Python - Size: 2.79 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 17 - Forks: 4

Zipstack/unstract-adapters

Unstract's interface to LLMs, Embeddings and VectorDBs.

Language: Python - Size: 632 KB - Last synced at: 6 months ago - Pushed at: 9 months ago - Stars: 16 - Forks: 3

rbitr/ferrite

Simple, lightweight transformers in Fortran

Language: Fortran - Size: 28.3 KB - Last synced at: 13 days ago - Pushed at: over 1 year ago - Stars: 16 - Forks: 1

easonlai/chatbot_with_pdf_streamlit

This code example shows how to make a chatbot for semantic search over documents using Streamlit, LangChain, and various vector databases. The chatbot lets users ask questions and get answers from a document collection. The code is in Python and can be customized for different scenarios and data.

Language: Jupyter Notebook - Size: 6.57 MB - Last synced at: 18 days ago - Pushed at: over 1 year ago - Stars: 15 - Forks: 5

ashutosh1919/data2vec-pytorch

Ready to run PyTorch implementation of Data2Vec 2.0: Highly efficient self-supervised representation learning for vision, speech and text.

Language: Python - Size: 116 KB - Last synced at: 2 days ago - Pushed at: about 2 years ago - Stars: 14 - Forks: 2

chaoweifang/PFE

Piecewise Flat Embedding for Image Segmentation

Language: C++ - Size: 35 MB - Last synced at: almost 2 years ago - Pushed at: almost 7 years ago - Stars: 14 - Forks: 3

KevKibe/docindex 📦

⚡️Framework for fast persistent storage of multiple document embeddings and metadata into Pinecone for source-traceable, production-level RAG.

Language: Python - Size: 816 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 13 - Forks: 3

ritaranx/BMRetriever

This is the code for our paper "BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers".

Language: Python - Size: 728 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 12 - Forks: 2

mana-ysh/symmetry-learning-kgc

Python implementation of "Data-dependent Learning of Symmetric/Antisymmetric Relations for Knowledge Base Completion [Manabe+. 2018]"

Language: Python - Size: 11.7 MB - Last synced at: 4 days ago - Pushed at: about 7 years ago - Stars: 11 - Forks: 1

itmo-mbss-lab/sr_labs_book

The project is related to the development of labs for the ITMO Speaker Recognition Course.

Language: Jupyter Notebook - Size: 3.25 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 8

aws-samples/fine-tune-embedding-models-on-sagemaker

This repository contains samples for fine-tuning embedding models using Amazon SageMaker. Embedding models are useful for tasks such as semantic similarity, text clustering, and information retrieval. Fine-tuning these models on your specific domain data can greatly improve their performance.

Language: Jupyter Notebook - Size: 41 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 9 - Forks: 0

easonlai/chat_with_pdf_table

The contents of this repository showcase how to extract table data from a PDF file and preprocess it to facilitate word embedding. This preprocessing step enhances the readability of table data for language models and enables us to extract more contextual information from the tables.

Language: Jupyter Notebook - Size: 85.9 KB - Last synced at: 18 days ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 4

natelalor/AI_report_generator

A tool that converts long audio files into a thorough, summarized report. Leverages OpenAI and its API (ChatGPT backend), Langchain for text processing, and Pinecone for vector database facilitation.

Language: Python - Size: 15.3 MB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 2

karolzak/images-vector-search

Simple implementation of search for visually similar images using deep learning and vector search. It's based on pretrained ImageNet weights so it doesnt require any additional training

Language: Python - Size: 9.23 MB - Last synced at: 21 days ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 2

bnabis93/vision-language-examples

Vision-lanugage model example code.

Language: Python - Size: 2.99 MB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 0

agankur21/entity_disambiguation

Neural Network models to map mention of a text to corresponding entity in the Knowledge Base

Language: Python - Size: 1.99 MB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 7 - Forks: 3

sovit-123/local_file_search

Local file search using embedding techniques

Language: Python - Size: 110 KB - Last synced at: 10 days ago - Pushed at: about 2 months ago - Stars: 6 - Forks: 1

touhi99/DL_Dialogue_act_classification

DL Lab Project - Given a subset of switchboard corpus, goal is to classify dialogue acts from Speech and Text data. We define a RNN-LSTM model for Text classification and CNN model for speech classification and then ensemble both model to output a stable and higher performance model

Language: Python - Size: 1.33 MB - Last synced at: 23 days ago - Pushed at: over 4 years ago - Stars: 6 - Forks: 0

Seven-33/langchain-chat

langchain-chat is an AI-driven Q&A system that leverages OpenAI's GPT-4 model and FAISS for efficient document indexing. It loads and splits documents from websites or PDFs, remembers conversations, and provides accurate, context-aware answers based on the indexed data. Easy to set up and extend.

Language: Python - Size: 433 KB - Last synced at: 18 days ago - Pushed at: almost 2 years ago - Stars: 5 - Forks: 0

nur-ag/IGEL

IGEL: Inductive Graph Embeddings through Locality Encodings

Language: Jupyter Notebook - Size: 13.2 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

p16i/siamese-net-and-friends

Language: Python - Size: 5.41 MB - Last synced at: 23 days ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 3

jargonsdev/ai

The AI-Powered assistant for jargons.dev ecosystem

Language: TypeScript - Size: 135 KB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 4 - Forks: 1

louisbrulenaudet/lemone-embed

All-in-one repo for the Lemone-embed project, a series of fine-tuned embedding models for Tax retrieval augmented generation (RAG).

Language: Python - Size: 3.6 MB - Last synced at: 7 days ago - Pushed at: 2 months ago - Stars: 4 - Forks: 0

chaosgen/awesome-sentence-embedding

A curated list of pretrained sentence and word embedding models

Language: Python - Size: 213 KB - Last synced at: 8 days ago - Pushed at: 6 months ago - Stars: 4 - Forks: 0

whw199833/gbiz_torch

A comprehensive toolkit package designed to help you accurately predict key metrics in commercial area

Language: Python - Size: 242 KB - Last synced at: 10 days ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 0

huyhoang17/Visual_Embedding_Tutorial

MNIST Embedding Visualisation using Tensorflow Projector, link blog:

Language: HTML - Size: 15.6 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 0

xinyu-intel/ncf_mxnet

Neural Collaborative Filtering with MXNet

Language: Python - Size: 27.4 MB - Last synced at: 23 days ago - Pushed at: over 5 years ago - Stars: 4 - Forks: 2

andikarachman/News-Title-Classification

Machine learning model to classify news into categories based on their headline

Language: Jupyter Notebook - Size: 6.26 MB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 4 - Forks: 2

trustlelab/siteware-backend-v2

Siteware Backend - German Voice AI Agent provider - Deepgram + Twilio + Elevenlabs + OpenAI + Pinecone

Language: TypeScript - Size: 110 KB - Last synced at: 11 days ago - Pushed at: 5 months ago - Stars: 3 - Forks: 0

wuji3/nlpdk

Natural Language Processing(NLP) Toolbox

Language: Python - Size: 324 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 3 - Forks: 1

ksm26/Embedding-Models-From-Architecture-to-Implementation

Understand and build embedding models, focusing on word and sentence embeddings, dual encoder architectures. Learn to train embedding models using contrastive loss, implement them in semantic search and RAG systems.

Language: Jupyter Notebook - Size: 2 MB - Last synced at: 24 days ago - Pushed at: 8 months ago - Stars: 3 - Forks: 0

olasunkanmi-SE/IntelliSearch

IntelliSearch is an advanced retrieval-based question-answering and recommendation system that leverages embeddings and a large language model (LLM) to provide accurate and relevant information to users.

Language: TypeScript - Size: 1010 KB - Last synced at: 11 months ago - Pushed at: 12 months ago - Stars: 3 - Forks: 0

BogdanFloris/detecting-and-addressing-change

Code for my Master Thesis: How to detect and address changes in machine learning based data pipelines

Language: Python - Size: 151 KB - Last synced at: 12 months ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

GiorgiaAuroraAdorni/bachelor-thesis

Detailed analysis of the research project, carried out during my bachelor internship, entitled "Neural Networks for the Learning of personality traits from Natural Language". @ Unimib 17/18.

Language: TeX - Size: 12.6 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 0

MottoX/EBR-papers

Helpful papers on embedding-based retrieval (EBR)

Size: 1.95 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

GiorgiaAuroraAdorni/learning-personality

Learning Personality is a bachelor internship project that use neural networks to extract, from natural language (in particular reviews), personality traits, through automatic approaches. @ Unimib 17/18.

Language: Python - Size: 1010 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 3 - Forks: 1

firojalam/crisis-embedding-models

Embedding models designed using crisis related tweets collected by AIDR (http://aidr.qcri.org/)

Language: Python - Size: 950 KB - Last synced at: 2 months ago - Pushed at: over 6 years ago - Stars: 3 - Forks: 0

SINGHxTUSHAR/IMDB-Analysis

IMDB-Analysis is a sentiment Analysis project based on movie review, whether it is +ve or -ve. Model is design with a simple RNN architecture and embedded with word2vec. Deployed on streamlit web-app open cloud service.

Language: Jupyter Notebook - Size: 16 MB - Last synced at: 20 days ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

SubhangiSati/RAG-using-DeepSeek-R1

This repository highlights my learning journey in building Retrieval-Augmented Generation (RAG) pipelines using DeepSeek on Lightning AI, covering document ingestion, retrieval, and integration with generative AI. It showcases fine-tuning, evaluation, and optimization for accurate open-domain QA and knowledge management.

Language: Jupyter Notebook - Size: 1.01 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

tobiasodion/RAGBOT

A CLI chatbot that uses RAG architecture for improving and adapting LLM to specific context. It allows users to ask questions and get response directly from open-source LLMs(OpenAI, MistralAI etc.) or from the information on a website which is provided as context using the RAG architecture.

Language: JavaScript - Size: 788 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 2 - Forks: 0

madeyexz/social_vegan

A dating (match-making) app for serious daters with embeddings and vector database.

Language: Python - Size: 2.98 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

VectifyAI/FAE

A method to fine-tune the black box OpenAI’s embedding model.

Language: Jupyter Notebook - Size: 16 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

karhunenloeve/NTOPL 📦

Estimation of Neural Network Dimension using Algebraic Topology and Lie Theory.

Language: Python - Size: 713 KB - Last synced at: 6 months ago - Pushed at: about 3 years ago - Stars: 2 - Forks: 0

nanxstats/exp2vec

🧬 Tissue-specific gene embeddings trained on GTEx data

Language: R - Size: 99.6 MB - Last synced at: 3 days ago - Pushed at: about 3 years ago - Stars: 2 - Forks: 1

BatoolHamawi/COVID-19WordEmbeddings

COVID-19 Arabic Word embeddings is a domain- specific pre-trained distributed word representation of COVID-19 Tweets which aims to provide the Arabic NLP research community with free to use and powerful word embedding models.

Language: Jupyter Notebook - Size: 54.7 KB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 2 - Forks: 0

pHequals7/NLP_Notebooks

NLP related concepts, challenges and datasets

Language: Jupyter Notebook - Size: 172 KB - Last synced at: about 1 year ago - Pushed at: almost 6 years ago - Stars: 2 - Forks: 0

imansaleh16/Stack-Overflow-Tags-Communities

Dataset used to produce communities of related tags in Stack Overflow

Size: 7.99 MB - Last synced at: 7 months ago - Pushed at: about 6 years ago - Stars: 2 - Forks: 0

sthanhng/Fashion-MNIST-Embedding-Visualization

Fashion-MNIST Embedding Visualization using TensorFlow Projector

Language: Python - Size: 14.1 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 2 - Forks: 0

mana-ysh/poincare-embeddings

Implementation of poincare embeddings

Language: Scala - Size: 12.7 KB - Last synced at: 4 days ago - Pushed at: over 7 years ago - Stars: 2 - Forks: 1

ritesh-modi/embedding-hallucinations

This repo shows how foundational model hallucinates and how we can fix such hallucinations using fine-tuning them

Language: Python - Size: 474 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 0

YeonwooSung/nano-embeddings

The simplest, fastest repository for training/finetuning mini size embedding models like BGE and ModernBERT

Language: Python - Size: 12.7 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

Johnymonteiiro/ai_school_assistent

This assistant is designed to function as an educational support tool, specifically to assist in analyzing student data and identifying patterns of dropout risk based on information provided by the institution's database.

Language: Python - Size: 39.1 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 1 - Forks: 0

sacredvoid/ai_clinical_trial

Developing a system to match eligible patients to ongoing clinical trials using Vector Embeddings and LLMs!

Language: Python - Size: 70.3 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

Md-Emon-Hasan/Retrieval-Augmented-Generation-RAG

RAG enhances LLMs by retrieving relevant external knowledge before generating responses, improving accuracy and reducing hallucinations.

Language: Jupyter Notebook - Size: 569 KB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

SINGHxTUSHAR/NextWordAI

NextWordAI : predict the next word using the LSTM, GRU. This project aims to develop a deep learning model for predicting the next word in a given sequence of words. The model is built using Long Short-Term Memory (LSTM) networks, which are well-suited for sequence prediction tasks.

Language: Jupyter Notebook - Size: 25.4 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

AstraBert/SenTrEv-demo

Demo for SenTrEv python package

Language: Jupyter Notebook - Size: 1.4 MB - Last synced at: about 13 hours ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

GatlenCulp/embedding_translation

Alignment across Deep Neural Network Language Models’ Representations

Language: HTML - Size: 278 MB - Last synced at: 27 days ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

JihoonChung/ML_Sensor_Characterization

This project is an effort to characterize the sensor especially ultrasonic sensor using machine learning method. This later could be used in various application such as defective sensor detection.

Language: Jupyter Notebook - Size: 88.3 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

Varunv003/langchain-palm2-rag_application

DocChat: Langchain Retrieval System, seamlessly navigate and converse with your documents using Langchain-powered AI, transforming PDF content into actionable insights through natural language interactions.

Language: Jupyter Notebook - Size: 319 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

ziozzang/embedding-server

Testing Embedding Server (Compatible OpenAI API). model from LLaMa/Mistral

Language: Python - Size: 6.84 KB - Last synced at: 29 days ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

mNemlaghi/cloud-embeddings

A repository for tackling cloud text pre-trained embeddings, from evaluation to deployment, including fine-tuning and vector stores.

Language: Python - Size: 103 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

Related Topics
machine-learning 31 embeddings 29 nlp 24 python 23 deep-learning 21 rag 20 llm 17 openai 16 langchain 16 retrieval-augmented-generation 14 vector-database 14 vector-search 13 natural-language-processing 12 word2vec 12 tensorflow 10 faiss 9 neural-networks 9 fine-tuning 9 sentence-embeddings 9 embedding-vectors 9 pytorch 9 pinecone 9 huggingface 8 embedding 8 semantic-search 8 keras 8 sentence-transformers 7 ai 7 word-embeddings 6 recommender-system 6 retrieval 6 chatbot 6 generative-ai 6 deep-neural-networks 6 neural-network 6 nlp-machine-learning 6 recommendation-system 5 information-retrieval 5 bert 5 large-language-models 5 knowledge-graph 5 openai-api 5 llms 5 rnn 5 lstm 5 text-classification 5 flask 4 gpt-4 4 embedding-python 4 wordembedding 4 glove-embeddings 4 prompt-engineering 4 natural-language 4 unsupervised-learning 4 knowledge-graph-embeddings 4 artificial-intelligence 4 semantic-similarity 3 text-analysis 3 embeddings-word2vec 3 bert-model 3 awesome 3 preprocessing 3 topic-modeling 3 clustering 3 knowledge-graph-completion 3 lstm-neural-networks 3 onnx 3 chroma 3 chromadb 3 mteb 3 langchain-python 3 data-science 3 evaluation 3 gpt-3 3 text-mining 3 classification 3 gru 3 azure-openai 3 vector 3 streamlit-webapp 3 network-analysis 3 transformer 3 function-calling 3 computer-vision 3 bert-embeddings 2 awesome-list 2 word2vec-embeddinngs 2 bert-fine-tuning 2 music-information-retrieval 2 roberta-model 2 gensim 2 sklearn 2 cross-lingual 2 logistic-regression 2 text-generation 2 personality 2 mikolov 2 bag-of-words 2 pretrained-models 2 embedded 2