An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: embedding-models

rafay123321/embedding-hallucinations

This repo shows how foundational model hallucinates and how we can fix such hallucinations using fine-tuning them

Language: Python - Size: 476 KB - Last synced at: about 9 hours ago - Pushed at: about 10 hours ago - Stars: 0 - Forks: 0

jonathanfavorite/RAGamuffin

A lightweight, cross-platform .NET library for building RAG (Retrieval-Augmented Generation) pipelines with local embedding models and SQLite vector storage. Perfect for developers who need privacy-focused, offline-capable document search and AI-powered question answering without external API dependencies.

Language: C# - Size: 6.71 MB - Last synced at: about 11 hours ago - Pushed at: about 12 hours ago - Stars: 0 - Forks: 0

yuniko-software/tokenizer-to-onnx-model

Convert Hugging Face tokenizers to ONNX models for cross-language compatibility (.NET, Java, Python) with embedding models

Language: Jupyter Notebook - Size: 43 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 24 - Forks: 2

mangopy/tool-retrieval-benchmark

Official code for ACL2025 "🔍 Retrieval Models Aren’t Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models"

Language: JavaScript - Size: 3.3 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 176 - Forks: 2

ContextualAI/gritlm

Generative Representational Instruction Tuning

Language: Jupyter Notebook - Size: 11.3 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 654 - Forks: 47

BBC-Esq/VectorDB-Plugin

Plugin that lets you ask questions about your documents including audio and video files.

Language: Python - Size: 34.4 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 340 - Forks: 44

SnowNation101/NYX

Unified Multimodal Retriever for RAG

Language: Python - Size: 1.88 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 4 - Forks: 0

Hironsan/awesome-embedding-models

A curated list of awesome embedding models tutorials, projects and communities.

Language: Jupyter Notebook - Size: 47.9 KB - Last synced at: 1 day ago - Pushed at: about 6 years ago - Stars: 1,796 - Forks: 251

Separius/awesome-sentence-embedding 📦

A curated list of pretrained sentence and word embedding models

Language: Python - Size: 282 KB - Last synced at: 2 days ago - Pushed at: about 4 years ago - Stars: 2,260 - Forks: 263

yusufhilmi/client-vector-search

A client side vector search library that can embed, store, search, and cache vectors. Works on the browser and node. It outperforms OpenAI's text-embedding-ada-002 and is way faster than Pinecone and other VectorDBs.

Language: TypeScript - Size: 314 KB - Last synced at: 1 day ago - Pushed at: about 1 year ago - Stars: 210 - Forks: 14

StarlightSearch/EmbedAnything

Production-ready Inference, Ingestion and Indexing built in Rust 🦀

Language: Rust - Size: 30.9 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 626 - Forks: 56

databricks-industry-solutions/product-search

Semantic product search on Databricks

Language: Python - Size: 513 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 32 - Forks: 14

oracle-samples/ai-optimizer

GenAI/RAG Optimizer and Toolkit for experimentation using Oracle Database AI Vector Search

Language: Python - Size: 25.5 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 43 - Forks: 23

harehimself/pinecone-lab

Experimenting with Pinecone as vector data continues to take center stage in AI-native systems. The purpose of this project is to explore the core capabilities, benchmark performance across different embedding models, and better understand what is possible with vector search in production environments.

Language: Python - Size: 104 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 1 - Forks: 0

yuniko-software/power-embeddings

PowerEmbeddings is a C# library that makes embedding generation easier in .NET applications. It is aimed at simplifying the implementation of semantic search, full-text search, RAG, and hybrid search solutions within the .NET ecosystem

Language: Jupyter Notebook - Size: 54.7 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

emapco/chem-mrl

Chem-MRL: SMILES Matryoshka Representation Learning Embedding Model

Language: Python - Size: 31.5 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

Sujit-O/pykg2vec

Python library for knowledge graph embedding and representation learning.

Language: Python - Size: 9.29 MB - Last synced at: 4 days ago - Pushed at: about 4 years ago - Stars: 614 - Forks: 113

alisonbma/aiSFX

Representation Learning for the Automatic Indexing of Sound Effects Libraries (ISMIR 2022): Deep audio embeddings pre-trained on UCS & Non-UCS-compliant datasets.

Language: Python - Size: 59.6 KB - Last synced at: 5 days ago - Pushed at: about 2 years ago - Stars: 45 - Forks: 4

lisekarimi/lexo

🗯️ LLM toolkit for RAG, tuning, agents, and more

Language: Jupyter Notebook - Size: 3.16 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 1 - Forks: 0

ashutosh1919/data2vec-pytorch

Ready to run PyTorch implementation of Data2Vec 2.0: Highly efficient self-supervised representation learning for vision, speech and text.

Language: Python - Size: 116 KB - Last synced at: 7 days ago - Pushed at: over 2 years ago - Stars: 15 - Forks: 3

mana-ysh/knowledge-graph-embeddings 📦

Implementations of Embedding-based methods for Knowledge Base Completion tasks

Language: Python - Size: 10.2 MB - Last synced at: 1 day ago - Pushed at: over 4 years ago - Stars: 259 - Forks: 63

with-caer/curtana

Simplified zero-cost wrapper over llama.cpp powered by the lama-cpp-2 Crate.

Language: Rust - Size: 15.6 KB - Last synced at: 22 days ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

marl/openl3

OpenL3: Open-source deep audio and image embeddings

Language: Jupyter Notebook - Size: 687 MB - Last synced at: 30 days ago - Pushed at: about 2 years ago - Stars: 517 - Forks: 60

sharukat/emergency-yt-insights

AI-powered platform that analyzes YouTube transcripts of emergency events to deliver real-time insights using NLP, vector search, and a conversational assistant.

Language: TypeScript - Size: 4.74 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

itmo-mbss-lab/sr_labs_book

The project is related to the development of labs for the ITMO Speaker Recognition Course.

Language: Jupyter Notebook - Size: 3.25 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 10 - Forks: 8

VenkatRamaraju/polydb

a vector database + embedding model written from scratch in go

Language: Go - Size: 20.1 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

newking9088/product_recommendation_nlp_roberta_vader

Sentiment-Enhanced Product Recommendation System for E-Commerce: A Comparative Analysis of RoBERTa and VADER

Language: Jupyter Notebook - Size: 13.5 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

leokwsw/local-rag

A local rag demo

Language: Python - Size: 19.5 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

sovit-123/local_file_search

Local file search using embedding techniques

Language: Python - Size: 113 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 6 - Forks: 1

DeepLearn1998/My_RAG

My first RAG

Language: Python - Size: 5.86 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

Aampjunn/Vector-Store

A minimal project to understand how cosine similarity works in a vector database 🧠📊. It demonstrates semantic search by converting text into embeddings and comparing them using vector math.

Language: TypeScript - Size: 187 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

04bhavyaa/langchain-models

This project explores various LLMs and embedding models using LangChain, integrating OpenAI, Hugging Face, Google Gemini, and Anthropic. It includes chat models, document similarity search, and embeddings with cosine similarity for retrieval. The setup is simple, making it easy to experiment with LLMs and vector search. 🚀 (Big Thankyou to CampusX)

Language: Python - Size: 7.81 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

newking9088/gpt_llama_rag_fine_tuning_classification

A repository for implementing and evaluating state-of-the-art LLM techniques including fine-tuning, Retrieval-Augmented Generation (RAG), and model evaluation.

Language: Jupyter Notebook - Size: 22.5 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

SayamAlt/Langchain-with-Python-Bootcamp

This repository covers all the code materials covered within Jose Portilla's Langchain with Python Bootcamp on Udemy.

Language: Jupyter Notebook - Size: 15.4 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

spcl/ncc

Neural Code Comprehension: A Learnable Representation of Code Semantics

Language: Python - Size: 9.16 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 213 - Forks: 51

p768lwy3/torecsys

ToR[e]cSys is a PyTorch Framework to implement recommendation system algorithms, including but not limited to click-through-rate (CTR) prediction, learning-to-ranking (LTR), and Matrix/Tensor Embedding. The project objective is to develop an ecosystem to experiment, share, reproduce, and deploy in real-world in a smooth and easy way.

Language: Python - Size: 6.42 MB - Last synced at: 28 days ago - Pushed at: about 3 years ago - Stars: 104 - Forks: 18

doobidoo/AgentNexus

A TypeScript-based autonomous agent framework with modular systems for memory, planning, and tool integration. Features vector-based recall, multi-strategy planning, and extensible tools for AI agent development.

Language: TypeScript - Size: 2.38 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

GatlenCulp/embedding_translation

Alignment across Deep Neural Network Language Models’ Representations

Language: HTML - Size: 328 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

RideneFiras/KagglexGoogle

Language: Jupyter Notebook - Size: 176 KB - Last synced at: 18 days ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

worldbank/GISTEmbed

GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embeddings

Language: Python - Size: 1.3 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 42 - Forks: 3

aws-samples/fine-tune-embedding-models-on-sagemaker

This repository contains samples for fine-tuning embedding models using Amazon SageMaker. Embedding models are useful for tasks such as semantic similarity, text clustering, and information retrieval. Fine-tuning these models on your specific domain data can greatly improve their performance.

Language: Jupyter Notebook - Size: 47.9 KB - Last synced at: 24 days ago - Pushed at: 4 months ago - Stars: 12 - Forks: 0

ritesh-modi/embedding-hallucinations

This repo shows how foundational model hallucinates and how we can fix such hallucinations using fine-tuning them

Language: Python - Size: 474 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

jgraving/selfsne

Self-Supervised Noise Embeddings (Self-SNE)

Language: Jupyter Notebook - Size: 2.79 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 158 - Forks: 13

akutuzov/webvectors

Web-ify your word2vec: framework to serve distributional semantic models online

Language: Python - Size: 4.85 MB - Last synced at: about 7 hours ago - Pushed at: 4 months ago - Stars: 200 - Forks: 47

YeonwooSung/nano-embeddings

The simplest, fastest repository for training/finetuning mini size embedding models like BGE and ModernBERT

Language: Python - Size: 12.7 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

leducanh95/topic-modeling

Topic modeling and document clustering

Language: Python - Size: 19.5 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

GALA-MDS/Gala-External-Resources

This repository compiles and data sources created for the CHIST ERA 2025 proposal GALA.

Language: Jupyter Notebook - Size: 70.9 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

shobrook/weightgain

Train an adapter for any embedding model in under a minute

Language: Python - Size: 544 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 98 - Forks: 2

louisbrulenaudet/lemone-embed

All-in-one repo for the Lemone-embed project, a series of fine-tuned embedding models for Tax retrieval augmented generation (RAG).

Language: Python - Size: 3.6 MB - Last synced at: 3 days ago - Pushed at: 2 months ago - Stars: 4 - Forks: 0

whw199833/gbiz_torch

A comprehensive toolkit package designed to help you accurately predict key metrics in commercial area

Language: Python - Size: 242 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 0

7446Nguyen/COFFEE_RAG

Get personalized coffee recommendations using Retrieval-Augmented Generation (RAG) to match your preferences with expert insights.

Language: Python - Size: 16.4 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

jargonsdev/ai

The AI-Powered assistant for jargons.dev ecosystem

Language: TypeScript - Size: 135 KB - Last synced at: 3 days ago - Pushed at: 4 months ago - Stars: 4 - Forks: 1

Johnymonteiiro/ai_school_assistent

This assistant is designed to function as an educational support tool, specifically to assist in analyzing student data and identifying patterns of dropout risk based on information provided by the institution's database.

Language: Python - Size: 39.1 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

HITsz-TMG/KaLM-Embedding

Code for KaLM-Embedding models

Language: Python - Size: 319 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 74 - Forks: 6

KevKibe/docindex 📦

⚡️Framework for fast persistent storage of multiple document embeddings and metadata into Pinecone for source-traceable, production-level RAG.

Language: Python - Size: 816 KB - Last synced at: 12 days ago - Pushed at: 6 months ago - Stars: 13 - Forks: 3

ALucek/QuicKB

Optimize Document Retrieval with Fine-Tuned KnowledgeBases

Language: Python - Size: 1.63 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 107 - Forks: 21

Jiaxi-Huang/HackerLLM

Simple Work

Language: Vue - Size: 75.6 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

sacredvoid/ai_clinical_trial

Developing a system to match eligible patients to ongoing clinical trials using Vector Embeddings and LLMs!

Language: Python - Size: 70.3 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

sathyaseelancr/RAGImplementation

Retrieval Augmented Generation - Buying a car

Language: Python - Size: 18 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

mana-ysh/symmetry-learning-kgc

Python implementation of "Data-dependent Learning of Symmetric/Antisymmetric Relations for Knowledge Base Completion [Manabe+. 2018]"

Language: Python - Size: 11.7 MB - Last synced at: 1 day ago - Pushed at: over 7 years ago - Stars: 11 - Forks: 1

pranav-kural/ledaa-load-data

AWS Lambda function handling data ingestion in RAG pipeline of LEDAA project.

Language: Python - Size: 14.6 KB - Last synced at: 9 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Md-Emon-Hasan/Retrieval-Augmented-Generation-RAG

RAG enhances LLMs by retrieving relevant external knowledge before generating responses, improving accuracy and reducing hallucinations.

Language: Jupyter Notebook - Size: 569 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

FabianGroeger96/deep-embedded-music

Creation of an embedding space using unsupervised triplet loss and Tile2Vec that can be used for a variety of downstream tasks

Language: Jupyter Notebook - Size: 26.4 MB - Last synced at: about 2 months ago - Pushed at: almost 4 years ago - Stars: 18 - Forks: 2

SINGHxTUSHAR/IMDB-Analysis

IMDB-Analysis is a sentiment Analysis project based on movie review, whether it is +ve or -ve. Model is design with a simple RNN architecture and embedded with word2vec. Deployed on streamlit web-app open cloud service.

Language: Jupyter Notebook - Size: 16 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

Scicrop/javaSentenceBertEmbedding

Java ONNX Embedding & Retrieval-Augmented Generation (RAG) Engine

Language: Java - Size: 23.4 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

pngo1997/Retrieval-Augmented-Retrieval-RAG-for-Cleantech-Media

Implements a Retrieval-Augmented Generation (RAG) system.

Language: Jupyter Notebook - Size: 21.7 MB - Last synced at: 11 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

SubhangiSati/RAG-using-DeepSeek-R1

This repository highlights my learning journey in building Retrieval-Augmented Generation (RAG) pipelines using DeepSeek on Lightning AI, covering document ingestion, retrieval, and integration with generative AI. It showcases fine-tuning, evaluation, and optimization for accurate open-domain QA and knowledge management.

Language: Jupyter Notebook - Size: 1.01 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

itmo-mbss-lab/sr_lectures_book

The project is related to the development of Basics of Voice Biometrics lecture book for the ITMO Speaker Recognition Course.

Language: TeX - Size: 1.28 MB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

AspadaX/dim

Use LLMs for effective and refined vectorizations.

Language: Rust - Size: 81.1 KB - Last synced at: 7 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

SINGHxTUSHAR/NextWordAI

NextWordAI : predict the next word using the LSTM, GRU. This project aims to develop a deep learning model for predicting the next word in a given sequence of words. The model is built using Long Short-Term Memory (LSTM) networks, which are well-suited for sequence prediction tasks.

Language: Jupyter Notebook - Size: 25.4 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

trustlelab/siteware-backend-v2

Siteware Backend - German Voice AI Agent provider - Deepgram + Twilio + Elevenlabs + OpenAI + Pinecone

Language: TypeScript - Size: 110 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 3 - Forks: 0

lgalke/vec4ir

Word Embeddings for Information Retrieval

Language: Python - Size: 965 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 225 - Forks: 42

AstraBert/SenTrEv-demo

Demo for SenTrEv python package

Language: Jupyter Notebook - Size: 1.4 MB - Last synced at: 3 days ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

vstep-chatbot/benchmark

Benchmark Vietnamese Embedding models and Tokenizers for RAG

Language: Jupyter Notebook - Size: 23.2 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

chaosgen/awesome-sentence-embedding

A curated list of pretrained sentence and word embedding models

Language: Python - Size: 213 KB - Last synced at: 7 days ago - Pushed at: 8 months ago - Stars: 4 - Forks: 0

Zipstack/unstract-adapters

Unstract's interface to LLMs, Embeddings and VectorDBs.

Language: Python - Size: 632 KB - Last synced at: 23 days ago - Pushed at: 11 months ago - Stars: 18 - Forks: 3

davide-abbattista/SciQA

The Scientific Question Answering (SciQA) System is an end-to-end solution designed to provide accurate, contextually relevant, and citation-supported answers to user queries.

Language: Jupyter Notebook - Size: 19.5 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

huacenxu/Embedding-Models-for-AI-Retrieval

This project develops a domain-specific embedding model to enhance document retrieval in AI-powered search systems. It incorporates techniques like synthetic data generation, model fine-tuning, and vector search using FAISS, evaluated with MRR@5 for performance.

Language: Python - Size: 4.88 KB - Last synced at: 25 days ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

huacenxu/COVID-Morality

This project builds a novel liberty dictionary to quantify liberty morality—a concept missing from the extended Moral Foundations Dictionary (eMFD)—and leverages it to study the relationship between audience engagement and COVID-related news.

Language: Jupyter Notebook - Size: 8.57 MB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

natelalor/AI_report_generator

A tool that converts long audio files into a thorough, summarized report. Leverages OpenAI and its API (ChatGPT backend), Langchain for text processing, and Pinecone for vector database facilitation.

Language: Python - Size: 15.3 MB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 2

chatterjeesaurabh/Natural-Language-Processing

Text Preprocessing, Embedding Methods such as BoW, TF-IDF and Word2Vec, Text Classification using LSTM, Topic Modeling with LDA and BERTopic.

Language: Jupyter Notebook - Size: 222 KB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

bhargav-joshi/Baby-Names-Predictor

Baby Names Prediction

Language: Jupyter Notebook - Size: 8.79 KB - Last synced at: 3 months ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

pengkenlim/CxNE_plants

Generation Co-expression Network Embeddings (CxNEs) for plant genes using Graph Attention Networks (GAT))

Language: Python - Size: 178 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

stackmodel/babyagi-autonomous-agents

Demonstrates how to implement BabyAGI by Yohei Nakajima.

Language: Python - Size: 31.3 KB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

TajaKuzman/pandachat-rag-benchmark

PandaChat-RAG benchmark for evaluation of RAG systems on a non-synthetic Slovenian test dataset.

Language: Python - Size: 842 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

shamspias/langchain-chat

langchain-chat is an AI-driven Q&A system that leverages OpenAI's GPT-4 model and FAISS for efficient document indexing. It loads and splits documents from websites or PDFs, remembers conversations, and provides accurate, context-aware answers based on the indexed data. Easy to set up and extend.

Language: Python - Size: 1.34 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 86 - Forks: 17

thustorage/PetPS

PetPS: Supporting Huge Embedding Models with Tiered Memory

Language: C++ - Size: 32.2 MB - Last synced at: 6 months ago - Pushed at: about 1 year ago - Stars: 30 - Forks: 2

celiason/museum-news

webapp to find out historic details about the museum

Language: Python - Size: 6.38 MB - Last synced at: 4 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

JihoonChung/ML_Sensor_Characterization

This project is an effort to characterize the sensor especially ultrasonic sensor using machine learning method. This later could be used in various application such as defective sensor detection.

Language: Jupyter Notebook - Size: 88.3 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

hase3b/SCPRAG

This repository implements a Retrieval-Augmented Generation (RAG) system for the Supreme Court of Pakistan, utilizing different LLMs, embedding models, and retrieval and generation enhancement strategies. It processes SCP judgments, applies chunking, and generates legal summaries and answers based on relevant case data.

Language: Jupyter Notebook - Size: 57.4 MB - Last synced at: 4 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

ksm26/Embedding-Models-From-Architecture-to-Implementation

Understand and build embedding models, focusing on word and sentence embeddings, dual encoder architectures. Learn to train embedding models using contrastive loss, implement them in semantic search and RAG systems.

Language: Jupyter Notebook - Size: 2 MB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 3 - Forks: 0

ai-lluminator/ai-training

This repository contains all of the AI training and data generation scripts for the AIlluminator project.

Language: Python - Size: 10.3 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

akthammomani/Casual_Conversation_Chatbot

Build a Multi-turn Conversations Chit-Chat Bot

Language: Jupyter Notebook - Size: 10.8 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

su-park/mteb_ko_leaderboard

한글 텍스트 임베딩 모델 리더보드

Size: 2.51 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 17 - Forks: 1

easonlai/chat_with_pdf_table

The contents of this repository showcase how to extract table data from a PDF file and preprocess it to facilitate word embedding. This preprocessing step enhances the readability of table data for language models and enables us to extract more contextual information from the tables.

Language: Jupyter Notebook - Size: 85.9 KB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 4

ikergarcia1996/MetaVec

A monolingual and cross-lingual meta-embedding generation and evaluation framework

Language: Python - Size: 69.3 KB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 80 - Forks: 5

easonlai/chatbot_with_pdf_streamlit

This code example shows how to make a chatbot for semantic search over documents using Streamlit, LangChain, and various vector databases. The chatbot lets users ask questions and get answers from a document collection. The code is in Python and can be customized for different scenarios and data.

Language: Jupyter Notebook - Size: 6.57 MB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 15 - Forks: 5

maxscheurer/cppe

C++ and Python library for Polarizable Embedding

Language: C++ - Size: 4.56 MB - Last synced at: 25 days ago - Pushed at: 10 months ago - Stars: 22 - Forks: 5

rbitr/ferrite

Simple, lightweight transformers in Fortran

Language: Fortran - Size: 28.3 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 16 - Forks: 1

wuji3/nlpdk

Natural Language Processing(NLP) Toolbox

Language: Python - Size: 324 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 3 - Forks: 1

Related Keywords
embedding-models 197 machine-learning 33 embeddings 30 rag 27 nlp 26 python 25 vector-database 24 deep-learning 22 openai 19 langchain 19 llm 18 retrieval-augmented-generation 18 vector-search 15 natural-language-processing 13 semantic-search 12 word2vec 12 pinecone 11 tensorflow 11 embedding-vectors 11 fine-tuning 10 pytorch 10 faiss 10 sentence-embeddings 10 huggingface 9 neural-networks 9 large-language-models 8 keras 8 embedding 8 ai 8 generative-ai 8 sentence-transformers 7 bert 7 chatbot 7 retrieval 6 word-embeddings 6 deep-neural-networks 6 recommender-system 6 neural-network 6 nlp-machine-learning 6 prompt-engineering 5 information-retrieval 5 lstm 5 knowledge-graph 5 rnn 5 openai-api 5 vector 5 recommendation-system 5 text-classification 5 llms 5 onnx 5 wordembedding 4 unsupervised-learning 4 flask 4 glove-embeddings 4 chromadb 4 natural-language 4 knowledge-graph-embeddings 4 artificial-intelligence 4 gpt-4 4 embedding-python 4 langchain-python 4 preprocessing 3 topic-modeling 3 computer-vision 3 rust 3 lora 3 gpt-3 3 text-mining 3 typescript 3 transformer 3 knowledge-graph-completion 3 roberta 3 llama 3 dotnet 3 gru 3 network-analysis 3 mteb 3 text-analysis 3 data-science 3 classification 3 lstm-neural-networks 3 chroma 3 embeddings-word2vec 3 bert-model 3 clustering 3 streamlit-webapp 3 awesome 3 chatgpt 3 semantic-similarity 3 evaluation 3 speaker-identification 2 qdrant 2 speaker-recognition 2 tensorflow2 2 speaker-verification 2 voice-activity-detection 2 voice-biometrics 2 tsne 2 natural-language-understanding 2 triplet-loss 2