GitHub topics: embedding-models
rafay123321/embedding-hallucinations
This repo shows how foundational model hallucinates and how we can fix such hallucinations using fine-tuning them
Language: Python - Size: 476 KB - Last synced at: about 9 hours ago - Pushed at: about 10 hours ago - Stars: 0 - Forks: 0

jonathanfavorite/RAGamuffin
A lightweight, cross-platform .NET library for building RAG (Retrieval-Augmented Generation) pipelines with local embedding models and SQLite vector storage. Perfect for developers who need privacy-focused, offline-capable document search and AI-powered question answering without external API dependencies.
Language: C# - Size: 6.71 MB - Last synced at: about 11 hours ago - Pushed at: about 12 hours ago - Stars: 0 - Forks: 0

yuniko-software/tokenizer-to-onnx-model
Convert Hugging Face tokenizers to ONNX models for cross-language compatibility (.NET, Java, Python) with embedding models
Language: Jupyter Notebook - Size: 43 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 24 - Forks: 2

mangopy/tool-retrieval-benchmark
Official code for ACL2025 "🔍 Retrieval Models Aren’t Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models"
Language: JavaScript - Size: 3.3 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 176 - Forks: 2

ContextualAI/gritlm
Generative Representational Instruction Tuning
Language: Jupyter Notebook - Size: 11.3 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 654 - Forks: 47

BBC-Esq/VectorDB-Plugin
Plugin that lets you ask questions about your documents including audio and video files.
Language: Python - Size: 34.4 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 340 - Forks: 44

SnowNation101/NYX
Unified Multimodal Retriever for RAG
Language: Python - Size: 1.88 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 4 - Forks: 0

Hironsan/awesome-embedding-models
A curated list of awesome embedding models tutorials, projects and communities.
Language: Jupyter Notebook - Size: 47.9 KB - Last synced at: 1 day ago - Pushed at: about 6 years ago - Stars: 1,796 - Forks: 251

Separius/awesome-sentence-embedding 📦
A curated list of pretrained sentence and word embedding models
Language: Python - Size: 282 KB - Last synced at: 2 days ago - Pushed at: about 4 years ago - Stars: 2,260 - Forks: 263

yusufhilmi/client-vector-search
A client side vector search library that can embed, store, search, and cache vectors. Works on the browser and node. It outperforms OpenAI's text-embedding-ada-002 and is way faster than Pinecone and other VectorDBs.
Language: TypeScript - Size: 314 KB - Last synced at: 1 day ago - Pushed at: about 1 year ago - Stars: 210 - Forks: 14

StarlightSearch/EmbedAnything
Production-ready Inference, Ingestion and Indexing built in Rust 🦀
Language: Rust - Size: 30.9 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 626 - Forks: 56

databricks-industry-solutions/product-search
Semantic product search on Databricks
Language: Python - Size: 513 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 32 - Forks: 14

oracle-samples/ai-optimizer
GenAI/RAG Optimizer and Toolkit for experimentation using Oracle Database AI Vector Search
Language: Python - Size: 25.5 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 43 - Forks: 23

harehimself/pinecone-lab
Experimenting with Pinecone as vector data continues to take center stage in AI-native systems. The purpose of this project is to explore the core capabilities, benchmark performance across different embedding models, and better understand what is possible with vector search in production environments.
Language: Python - Size: 104 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 1 - Forks: 0

yuniko-software/power-embeddings
PowerEmbeddings is a C# library that makes embedding generation easier in .NET applications. It is aimed at simplifying the implementation of semantic search, full-text search, RAG, and hybrid search solutions within the .NET ecosystem
Language: Jupyter Notebook - Size: 54.7 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

emapco/chem-mrl
Chem-MRL: SMILES Matryoshka Representation Learning Embedding Model
Language: Python - Size: 31.5 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

Sujit-O/pykg2vec
Python library for knowledge graph embedding and representation learning.
Language: Python - Size: 9.29 MB - Last synced at: 4 days ago - Pushed at: about 4 years ago - Stars: 614 - Forks: 113

alisonbma/aiSFX
Representation Learning for the Automatic Indexing of Sound Effects Libraries (ISMIR 2022): Deep audio embeddings pre-trained on UCS & Non-UCS-compliant datasets.
Language: Python - Size: 59.6 KB - Last synced at: 5 days ago - Pushed at: about 2 years ago - Stars: 45 - Forks: 4

lisekarimi/lexo
🗯️ LLM toolkit for RAG, tuning, agents, and more
Language: Jupyter Notebook - Size: 3.16 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 1 - Forks: 0

ashutosh1919/data2vec-pytorch
Ready to run PyTorch implementation of Data2Vec 2.0: Highly efficient self-supervised representation learning for vision, speech and text.
Language: Python - Size: 116 KB - Last synced at: 7 days ago - Pushed at: over 2 years ago - Stars: 15 - Forks: 3

mana-ysh/knowledge-graph-embeddings 📦
Implementations of Embedding-based methods for Knowledge Base Completion tasks
Language: Python - Size: 10.2 MB - Last synced at: 1 day ago - Pushed at: over 4 years ago - Stars: 259 - Forks: 63

with-caer/curtana
Simplified zero-cost wrapper over llama.cpp powered by the lama-cpp-2 Crate.
Language: Rust - Size: 15.6 KB - Last synced at: 22 days ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

marl/openl3
OpenL3: Open-source deep audio and image embeddings
Language: Jupyter Notebook - Size: 687 MB - Last synced at: 30 days ago - Pushed at: about 2 years ago - Stars: 517 - Forks: 60

sharukat/emergency-yt-insights
AI-powered platform that analyzes YouTube transcripts of emergency events to deliver real-time insights using NLP, vector search, and a conversational assistant.
Language: TypeScript - Size: 4.74 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

itmo-mbss-lab/sr_labs_book
The project is related to the development of labs for the ITMO Speaker Recognition Course.
Language: Jupyter Notebook - Size: 3.25 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 10 - Forks: 8

VenkatRamaraju/polydb
a vector database + embedding model written from scratch in go
Language: Go - Size: 20.1 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

newking9088/product_recommendation_nlp_roberta_vader
Sentiment-Enhanced Product Recommendation System for E-Commerce: A Comparative Analysis of RoBERTa and VADER
Language: Jupyter Notebook - Size: 13.5 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

leokwsw/local-rag
A local rag demo
Language: Python - Size: 19.5 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

sovit-123/local_file_search
Local file search using embedding techniques
Language: Python - Size: 113 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 6 - Forks: 1

DeepLearn1998/My_RAG
My first RAG
Language: Python - Size: 5.86 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

Aampjunn/Vector-Store
A minimal project to understand how cosine similarity works in a vector database 🧠📊. It demonstrates semantic search by converting text into embeddings and comparing them using vector math.
Language: TypeScript - Size: 187 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

04bhavyaa/langchain-models
This project explores various LLMs and embedding models using LangChain, integrating OpenAI, Hugging Face, Google Gemini, and Anthropic. It includes chat models, document similarity search, and embeddings with cosine similarity for retrieval. The setup is simple, making it easy to experiment with LLMs and vector search. 🚀 (Big Thankyou to CampusX)
Language: Python - Size: 7.81 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

newking9088/gpt_llama_rag_fine_tuning_classification
A repository for implementing and evaluating state-of-the-art LLM techniques including fine-tuning, Retrieval-Augmented Generation (RAG), and model evaluation.
Language: Jupyter Notebook - Size: 22.5 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

SayamAlt/Langchain-with-Python-Bootcamp
This repository covers all the code materials covered within Jose Portilla's Langchain with Python Bootcamp on Udemy.
Language: Jupyter Notebook - Size: 15.4 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

spcl/ncc
Neural Code Comprehension: A Learnable Representation of Code Semantics
Language: Python - Size: 9.16 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 213 - Forks: 51

p768lwy3/torecsys
ToR[e]cSys is a PyTorch Framework to implement recommendation system algorithms, including but not limited to click-through-rate (CTR) prediction, learning-to-ranking (LTR), and Matrix/Tensor Embedding. The project objective is to develop an ecosystem to experiment, share, reproduce, and deploy in real-world in a smooth and easy way.
Language: Python - Size: 6.42 MB - Last synced at: 28 days ago - Pushed at: about 3 years ago - Stars: 104 - Forks: 18

doobidoo/AgentNexus
A TypeScript-based autonomous agent framework with modular systems for memory, planning, and tool integration. Features vector-based recall, multi-strategy planning, and extensible tools for AI agent development.
Language: TypeScript - Size: 2.38 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

GatlenCulp/embedding_translation
Alignment across Deep Neural Network Language Models’ Representations
Language: HTML - Size: 328 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

RideneFiras/KagglexGoogle
Language: Jupyter Notebook - Size: 176 KB - Last synced at: 18 days ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

worldbank/GISTEmbed
GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embeddings
Language: Python - Size: 1.3 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 42 - Forks: 3

aws-samples/fine-tune-embedding-models-on-sagemaker
This repository contains samples for fine-tuning embedding models using Amazon SageMaker. Embedding models are useful for tasks such as semantic similarity, text clustering, and information retrieval. Fine-tuning these models on your specific domain data can greatly improve their performance.
Language: Jupyter Notebook - Size: 47.9 KB - Last synced at: 24 days ago - Pushed at: 4 months ago - Stars: 12 - Forks: 0

ritesh-modi/embedding-hallucinations
This repo shows how foundational model hallucinates and how we can fix such hallucinations using fine-tuning them
Language: Python - Size: 474 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

jgraving/selfsne
Self-Supervised Noise Embeddings (Self-SNE)
Language: Jupyter Notebook - Size: 2.79 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 158 - Forks: 13

akutuzov/webvectors
Web-ify your word2vec: framework to serve distributional semantic models online
Language: Python - Size: 4.85 MB - Last synced at: about 7 hours ago - Pushed at: 4 months ago - Stars: 200 - Forks: 47

YeonwooSung/nano-embeddings
The simplest, fastest repository for training/finetuning mini size embedding models like BGE and ModernBERT
Language: Python - Size: 12.7 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

leducanh95/topic-modeling
Topic modeling and document clustering
Language: Python - Size: 19.5 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

GALA-MDS/Gala-External-Resources
This repository compiles and data sources created for the CHIST ERA 2025 proposal GALA.
Language: Jupyter Notebook - Size: 70.9 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

shobrook/weightgain
Train an adapter for any embedding model in under a minute
Language: Python - Size: 544 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 98 - Forks: 2

louisbrulenaudet/lemone-embed
All-in-one repo for the Lemone-embed project, a series of fine-tuned embedding models for Tax retrieval augmented generation (RAG).
Language: Python - Size: 3.6 MB - Last synced at: 3 days ago - Pushed at: 2 months ago - Stars: 4 - Forks: 0

whw199833/gbiz_torch
A comprehensive toolkit package designed to help you accurately predict key metrics in commercial area
Language: Python - Size: 242 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 0

7446Nguyen/COFFEE_RAG
Get personalized coffee recommendations using Retrieval-Augmented Generation (RAG) to match your preferences with expert insights.
Language: Python - Size: 16.4 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

jargonsdev/ai
The AI-Powered assistant for jargons.dev ecosystem
Language: TypeScript - Size: 135 KB - Last synced at: 3 days ago - Pushed at: 4 months ago - Stars: 4 - Forks: 1

Johnymonteiiro/ai_school_assistent
This assistant is designed to function as an educational support tool, specifically to assist in analyzing student data and identifying patterns of dropout risk based on information provided by the institution's database.
Language: Python - Size: 39.1 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

HITsz-TMG/KaLM-Embedding
Code for KaLM-Embedding models
Language: Python - Size: 319 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 74 - Forks: 6

KevKibe/docindex 📦
⚡️Framework for fast persistent storage of multiple document embeddings and metadata into Pinecone for source-traceable, production-level RAG.
Language: Python - Size: 816 KB - Last synced at: 12 days ago - Pushed at: 6 months ago - Stars: 13 - Forks: 3

ALucek/QuicKB
Optimize Document Retrieval with Fine-Tuned KnowledgeBases
Language: Python - Size: 1.63 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 107 - Forks: 21

Jiaxi-Huang/HackerLLM
Simple Work
Language: Vue - Size: 75.6 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

sacredvoid/ai_clinical_trial
Developing a system to match eligible patients to ongoing clinical trials using Vector Embeddings and LLMs!
Language: Python - Size: 70.3 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

sathyaseelancr/RAGImplementation
Retrieval Augmented Generation - Buying a car
Language: Python - Size: 18 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

mana-ysh/symmetry-learning-kgc
Python implementation of "Data-dependent Learning of Symmetric/Antisymmetric Relations for Knowledge Base Completion [Manabe+. 2018]"
Language: Python - Size: 11.7 MB - Last synced at: 1 day ago - Pushed at: over 7 years ago - Stars: 11 - Forks: 1

pranav-kural/ledaa-load-data
AWS Lambda function handling data ingestion in RAG pipeline of LEDAA project.
Language: Python - Size: 14.6 KB - Last synced at: 9 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Md-Emon-Hasan/Retrieval-Augmented-Generation-RAG
RAG enhances LLMs by retrieving relevant external knowledge before generating responses, improving accuracy and reducing hallucinations.
Language: Jupyter Notebook - Size: 569 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

FabianGroeger96/deep-embedded-music
Creation of an embedding space using unsupervised triplet loss and Tile2Vec that can be used for a variety of downstream tasks
Language: Jupyter Notebook - Size: 26.4 MB - Last synced at: about 2 months ago - Pushed at: almost 4 years ago - Stars: 18 - Forks: 2

SINGHxTUSHAR/IMDB-Analysis
IMDB-Analysis is a sentiment Analysis project based on movie review, whether it is +ve or -ve. Model is design with a simple RNN architecture and embedded with word2vec. Deployed on streamlit web-app open cloud service.
Language: Jupyter Notebook - Size: 16 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

Scicrop/javaSentenceBertEmbedding
Java ONNX Embedding & Retrieval-Augmented Generation (RAG) Engine
Language: Java - Size: 23.4 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

pngo1997/Retrieval-Augmented-Retrieval-RAG-for-Cleantech-Media
Implements a Retrieval-Augmented Generation (RAG) system.
Language: Jupyter Notebook - Size: 21.7 MB - Last synced at: 11 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

SubhangiSati/RAG-using-DeepSeek-R1
This repository highlights my learning journey in building Retrieval-Augmented Generation (RAG) pipelines using DeepSeek on Lightning AI, covering document ingestion, retrieval, and integration with generative AI. It showcases fine-tuning, evaluation, and optimization for accurate open-domain QA and knowledge management.
Language: Jupyter Notebook - Size: 1.01 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

itmo-mbss-lab/sr_lectures_book
The project is related to the development of Basics of Voice Biometrics lecture book for the ITMO Speaker Recognition Course.
Language: TeX - Size: 1.28 MB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

AspadaX/dim
Use LLMs for effective and refined vectorizations.
Language: Rust - Size: 81.1 KB - Last synced at: 7 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

SINGHxTUSHAR/NextWordAI
NextWordAI : predict the next word using the LSTM, GRU. This project aims to develop a deep learning model for predicting the next word in a given sequence of words. The model is built using Long Short-Term Memory (LSTM) networks, which are well-suited for sequence prediction tasks.
Language: Jupyter Notebook - Size: 25.4 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

trustlelab/siteware-backend-v2
Siteware Backend - German Voice AI Agent provider - Deepgram + Twilio + Elevenlabs + OpenAI + Pinecone
Language: TypeScript - Size: 110 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 3 - Forks: 0

lgalke/vec4ir
Word Embeddings for Information Retrieval
Language: Python - Size: 965 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 225 - Forks: 42

AstraBert/SenTrEv-demo
Demo for SenTrEv python package
Language: Jupyter Notebook - Size: 1.4 MB - Last synced at: 3 days ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

vstep-chatbot/benchmark
Benchmark Vietnamese Embedding models and Tokenizers for RAG
Language: Jupyter Notebook - Size: 23.2 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

chaosgen/awesome-sentence-embedding
A curated list of pretrained sentence and word embedding models
Language: Python - Size: 213 KB - Last synced at: 7 days ago - Pushed at: 8 months ago - Stars: 4 - Forks: 0

Zipstack/unstract-adapters
Unstract's interface to LLMs, Embeddings and VectorDBs.
Language: Python - Size: 632 KB - Last synced at: 23 days ago - Pushed at: 11 months ago - Stars: 18 - Forks: 3

davide-abbattista/SciQA
The Scientific Question Answering (SciQA) System is an end-to-end solution designed to provide accurate, contextually relevant, and citation-supported answers to user queries.
Language: Jupyter Notebook - Size: 19.5 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

huacenxu/Embedding-Models-for-AI-Retrieval
This project develops a domain-specific embedding model to enhance document retrieval in AI-powered search systems. It incorporates techniques like synthetic data generation, model fine-tuning, and vector search using FAISS, evaluated with MRR@5 for performance.
Language: Python - Size: 4.88 KB - Last synced at: 25 days ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

huacenxu/COVID-Morality
This project builds a novel liberty dictionary to quantify liberty morality—a concept missing from the extended Moral Foundations Dictionary (eMFD)—and leverages it to study the relationship between audience engagement and COVID-related news.
Language: Jupyter Notebook - Size: 8.57 MB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

natelalor/AI_report_generator
A tool that converts long audio files into a thorough, summarized report. Leverages OpenAI and its API (ChatGPT backend), Langchain for text processing, and Pinecone for vector database facilitation.
Language: Python - Size: 15.3 MB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 2

chatterjeesaurabh/Natural-Language-Processing
Text Preprocessing, Embedding Methods such as BoW, TF-IDF and Word2Vec, Text Classification using LSTM, Topic Modeling with LDA and BERTopic.
Language: Jupyter Notebook - Size: 222 KB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

bhargav-joshi/Baby-Names-Predictor
Baby Names Prediction
Language: Jupyter Notebook - Size: 8.79 KB - Last synced at: 3 months ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

pengkenlim/CxNE_plants
Generation Co-expression Network Embeddings (CxNEs) for plant genes using Graph Attention Networks (GAT))
Language: Python - Size: 178 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

stackmodel/babyagi-autonomous-agents
Demonstrates how to implement BabyAGI by Yohei Nakajima.
Language: Python - Size: 31.3 KB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

TajaKuzman/pandachat-rag-benchmark
PandaChat-RAG benchmark for evaluation of RAG systems on a non-synthetic Slovenian test dataset.
Language: Python - Size: 842 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

shamspias/langchain-chat
langchain-chat is an AI-driven Q&A system that leverages OpenAI's GPT-4 model and FAISS for efficient document indexing. It loads and splits documents from websites or PDFs, remembers conversations, and provides accurate, context-aware answers based on the indexed data. Easy to set up and extend.
Language: Python - Size: 1.34 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 86 - Forks: 17

thustorage/PetPS
PetPS: Supporting Huge Embedding Models with Tiered Memory
Language: C++ - Size: 32.2 MB - Last synced at: 6 months ago - Pushed at: about 1 year ago - Stars: 30 - Forks: 2

celiason/museum-news
webapp to find out historic details about the museum
Language: Python - Size: 6.38 MB - Last synced at: 4 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

JihoonChung/ML_Sensor_Characterization
This project is an effort to characterize the sensor especially ultrasonic sensor using machine learning method. This later could be used in various application such as defective sensor detection.
Language: Jupyter Notebook - Size: 88.3 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

hase3b/SCPRAG
This repository implements a Retrieval-Augmented Generation (RAG) system for the Supreme Court of Pakistan, utilizing different LLMs, embedding models, and retrieval and generation enhancement strategies. It processes SCP judgments, applies chunking, and generates legal summaries and answers based on relevant case data.
Language: Jupyter Notebook - Size: 57.4 MB - Last synced at: 4 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

ksm26/Embedding-Models-From-Architecture-to-Implementation
Understand and build embedding models, focusing on word and sentence embeddings, dual encoder architectures. Learn to train embedding models using contrastive loss, implement them in semantic search and RAG systems.
Language: Jupyter Notebook - Size: 2 MB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 3 - Forks: 0

ai-lluminator/ai-training
This repository contains all of the AI training and data generation scripts for the AIlluminator project.
Language: Python - Size: 10.3 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

akthammomani/Casual_Conversation_Chatbot
Build a Multi-turn Conversations Chit-Chat Bot
Language: Jupyter Notebook - Size: 10.8 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

su-park/mteb_ko_leaderboard
한글 텍스트 임베딩 모델 리더보드
Size: 2.51 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 17 - Forks: 1

easonlai/chat_with_pdf_table
The contents of this repository showcase how to extract table data from a PDF file and preprocess it to facilitate word embedding. This preprocessing step enhances the readability of table data for language models and enables us to extract more contextual information from the tables.
Language: Jupyter Notebook - Size: 85.9 KB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 4

ikergarcia1996/MetaVec
A monolingual and cross-lingual meta-embedding generation and evaluation framework
Language: Python - Size: 69.3 KB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 80 - Forks: 5

easonlai/chatbot_with_pdf_streamlit
This code example shows how to make a chatbot for semantic search over documents using Streamlit, LangChain, and various vector databases. The chatbot lets users ask questions and get answers from a document collection. The code is in Python and can be customized for different scenarios and data.
Language: Jupyter Notebook - Size: 6.57 MB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 15 - Forks: 5

maxscheurer/cppe
C++ and Python library for Polarizable Embedding
Language: C++ - Size: 4.56 MB - Last synced at: 25 days ago - Pushed at: 10 months ago - Stars: 22 - Forks: 5

rbitr/ferrite
Simple, lightweight transformers in Fortran
Language: Fortran - Size: 28.3 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 16 - Forks: 1

wuji3/nlpdk
Natural Language Processing(NLP) Toolbox
Language: Python - Size: 324 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 3 - Forks: 1
