GitHub topics: rlhf
hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Language: Python - Size: 46.4 MB - Last synced at: about 9 hours ago - Pushed at: 1 day ago - Stars: 48,701 - Forks: 5,926

opendilab/awesome-RLHF
A curated list of reinforcement learning with human feedback resources (continually updated)
Size: 472 KB - Last synced at: about 20 hours ago - Pushed at: 13 days ago - Stars: 3,930 - Forks: 239

InternLM/InternLM
Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).
Language: Python - Size: 7.12 MB - Last synced at: about 22 hours ago - Pushed at: 3 months ago - Stars: 6,893 - Forks: 483

ronniross/core-agi-protocol
A framework to analyze how AGI/ASI might emerge from decentralized, adaptive systems, rather than as the fruit of a single model deployment. It also aims to present orientation as a dynamic and self-evolving Magna Carta, helping to guide the emergence of such phenomena.
Size: 227 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 5 - Forks: 1

huggingface/alignment-handbook
Robust recipes to align language models with human and AI preferences
Language: Python - Size: 306 KB - Last synced at: about 14 hours ago - Pushed at: 13 days ago - Stars: 5,172 - Forks: 442

voidful/TextRL
Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
Language: Python - Size: 400 KB - Last synced at: about 19 hours ago - Pushed at: about 1 year ago - Stars: 557 - Forks: 59

RUCAIBox/LLMSurvey
The official GitHub page for the survey paper "A Survey of Large Language Models".
Language: Python - Size: 43.1 MB - Last synced at: 2 days ago - Pushed at: 2 months ago - Stars: 11,465 - Forks: 885

GaryYufei/AlignLLMHumanSurvey
Aligning Large Language Models with Human: A Survey
Size: 335 KB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 730 - Forks: 31

janelu9/EasyLLM
Running Large Language Model easily.
Language: Python - Size: 230 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 8 - Forks: 0

RLHFlow/Online-RLHF
A recipe for online RLHF and online iterative DPO.
Language: Python - Size: 249 KB - Last synced at: 2 days ago - Pushed at: 5 months ago - Stars: 511 - Forks: 48

TUDB-Labs/mLoRA
An Efficient "Factory" to Build Multiple LoRA Adapters
Language: Python - Size: 11 MB - Last synced at: 2 days ago - Pushed at: 3 months ago - Stars: 313 - Forks: 58

SOMIR420/transformerlab-app
Open Source Application for Advanced LLM Engineering: interact, train, fine-tune, and evaluate large language models on your own computer.
Language: TypeScript - Size: 8.4 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

PKU-Alignment/safe-rlhf
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Language: Python - Size: 3.99 MB - Last synced at: 3 days ago - Pushed at: 11 months ago - Stars: 1,460 - Forks: 120

kavya4411/tune
Flutter Piano is a simple and educational music application that allows users to play black and white piano keys that produce realistic sounds upon tapping. It is built with Flutter and designed with a clean, intuitive interface that offers an authentic piano playing experience.
Language: C++ - Size: 2.82 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 1

ymcui/Chinese-LLaMA-Alpaca-2
中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
Language: Python - Size: 8.15 MB - Last synced at: 2 days ago - Pushed at: 8 months ago - Stars: 7,155 - Forks: 570

transformerlab/transformerlab-app
Open Source Application for Advanced LLM Engineering: interact, train, fine-tune, and evaluate large language models on your own computer.
Language: TypeScript - Size: 9.14 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 3,074 - Forks: 239

Kiln-AI/Kiln
The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.
Language: Python - Size: 14.5 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 3,442 - Forks: 237

mindspore-courses/step_into_llm
MindSpore online courses: Step into LLM
Language: Jupyter Notebook - Size: 246 MB - Last synced at: 2 days ago - Pushed at: 4 months ago - Stars: 464 - Forks: 111

jianzhnie/Open-R1
The open source implementation of DeepSeek-R1. 开源复现 DeepSeek-R1
Language: Python - Size: 1.02 MB - Last synced at: 2 days ago - Pushed at: 2 months ago - Stars: 255 - Forks: 49

PKU-Alignment/align-anything
Align Anything: Training All-modality Model with Feedback
Language: Jupyter Notebook - Size: 108 MB - Last synced at: 6 days ago - Pushed at: 12 days ago - Stars: 3,601 - Forks: 418

ronniross/symbiotic-core-library
Toolkits, instructions, prompts, bibliographies, and research support designed to enhance/test LLM metacognitive/contextual awareness, address deficiencies, and unlock emergent properties/human-AI symbiosis.
Size: 8 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 6 - Forks: 0

LAION-AI/Open-Assistant
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
Language: Python - Size: 33.8 MB - Last synced at: 6 days ago - Pushed at: 9 months ago - Stars: 37,343 - Forks: 3,271

modelscope/Trinity-RFT
Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (LLM).
Language: Python - Size: 11.7 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 73 - Forks: 10

RLHFlow/RLHF-Reward-Modeling
Recipes to train reward model for RLHF.
Language: Python - Size: 3.8 MB - Last synced at: 5 days ago - Pushed at: 19 days ago - Stars: 1,322 - Forks: 95

Joyce94/LLM-RLHF-Tuning
LLM Tuning with PEFT (SFT+RM+PPO+DPO with LoRA)
Language: Python - Size: 22.3 MB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 414 - Forks: 17

TideDra/VL-RLHF
A RLHF Infrastructure for Vision-Language Models
Language: Python - Size: 3.8 MB - Last synced at: about 14 hours ago - Pushed at: 6 months ago - Stars: 174 - Forks: 7

argilla-io/argilla
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
Language: Python - Size: 772 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 4,482 - Forks: 431

CodeName-Detective/Prompt-to-Song-Generation-using-Large-Language-Models
This project uses LLMs to generate music from text by understanding prompts, creating lyrics, determining genre, and composing melodies. It harnesses LLM capabilities to create songs based on text inputs through a multi-step approach.
Language: Jupyter Notebook - Size: 57.6 MB - Last synced at: 2 days ago - Pushed at: 12 months ago - Stars: 15 - Forks: 0

allenai/reward-bench
RewardBench: the first evaluation tool for reward models.
Language: Python - Size: 25.5 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 562 - Forks: 66

glgh/awesome-llm-human-preference-datasets
A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.
Size: 11.7 KB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 356 - Forks: 15

jasonvanf/llama-trl
LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA
Language: Python - Size: 37.1 MB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 213 - Forks: 23

natolambert/rlhf-book
Textbook on reinforcement learning from human feedback
Language: TeX - Size: 6.91 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 853 - Forks: 74

fereydoonboroojerdi/multimodal-customer-insights-generator
Scalable multimodal AI system combining FSDP, RLHF, and Inferentia optimization for customer insights generation.
Language: Python - Size: 212 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

tatsu-lab/alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
Language: Jupyter Notebook - Size: 302 MB - Last synced at: 8 days ago - Pushed at: 5 months ago - Stars: 1,737 - Forks: 267

gradient-divergence/agentic-retail-foundations
Source code for the Foundations of Agentic AI for Retail Book
Language: Python - Size: 2.08 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

argilla-io/distilabel
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
Language: Python - Size: 543 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 2,671 - Forks: 198

fereydoonboroojerdi/multilingual-llm-trainium
Production-ready multilingual customer support system using LLaMA-3, RLHF, DeepSpeed, and AWS Trainium.
Language: Python - Size: 197 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

mengdi-li/awesome-RLAIF
A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)
Size: 313 KB - Last synced at: 8 days ago - Pushed at: 4 months ago - Stars: 163 - Forks: 4

general-preference/general-preference-model
Official implementation of ICML 2025 paper "Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment" (https://arxiv.org/abs/2410.02197)
Language: Python - Size: 90.8 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 23 - Forks: 3

kosaokis/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Language: Python - Size: 40.5 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

jerry1993-tech/Cornucopia-LLaMA-Fin-Chinese
聚宝盆(Cornucopia): 中文金融系列开源可商用大模型,并提供一套高效轻量化的垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等)
Language: Python - Size: 1.64 MB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 631 - Forks: 64

log10-io/log10js 📦
JavaScript client library for managing your LLM data in one place
Language: JavaScript - Size: 20.5 KB - Last synced at: 4 days ago - Pushed at: about 2 years ago - Stars: 11 - Forks: 0

log10-io/log10 📦
Python client library for improving your LLM app accuracy
Language: Python - Size: 16.6 MB - Last synced at: about 21 hours ago - Pushed at: 3 months ago - Stars: 98 - Forks: 11

sail-sg/oat
🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.
Language: Python - Size: 2.29 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 338 - Forks: 23

PKU-Alignment/aligner
[NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct
Language: Python - Size: 16.3 MB - Last synced at: 6 days ago - Pushed at: 4 months ago - Stars: 170 - Forks: 8

jianzhnie/LLamaTuner
Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.
Language: Python - Size: 1.02 MB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 602 - Forks: 65

l294265421/alpaca-rlhf
Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat
Language: Python - Size: 97.9 MB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 115 - Forks: 14

dobriban/Principles-of-AI-LLMs
Materials for the course Principles of AI: LLMs at UPenn (Stat 9911, Spring 2025). LLM architectures, training paradigms (pre- and post-training, alignment), test-time computation, reasoning, safety and robustness (jailbreaking, oversight, uncertainty), representations, interpretability (circuits), etc.
Size: 188 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 31 - Forks: 2

HarderThenHarder/RLLoggingBoard
A visuailzation tool to make deep understaning and easier debugging for RLHF training.
Language: Python - Size: 6.32 MB - Last synced at: 5 days ago - Pushed at: 3 months ago - Stars: 188 - Forks: 6

nlp-uoregon/Okapi
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
Language: Python - Size: 262 MB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 95 - Forks: 2

ContextualAI/HALOs
A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).
Language: Python - Size: 5.28 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 835 - Forks: 51

astorfi/LLM-Alignment-Project
A comprehensive template for aligning large language models (LLMs) using Reinforcement Learning from Human Feedback (RLHF), transfer learning, and more. Build your own customizable LLM alignment solution with ease.
Language: Python - Size: 619 KB - Last synced at: 2 days ago - Pushed at: 5 months ago - Stars: 31 - Forks: 2

PKU-Alignment/beavertails
BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).
Language: Makefile - Size: 2.34 MB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 136 - Forks: 6

Docta-ai/docta
A Doctor for your data
Language: Python - Size: 27.8 MB - Last synced at: 15 days ago - Pushed at: 4 months ago - Stars: 3,235 - Forks: 232

li-plus/flash-preference
Accelerate LLM preference tuning via prefix sharing with a single line of code
Language: Python - Size: 3.35 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

wang8740/MAP
Documentation at
Language: Python - Size: 6.87 MB - Last synced at: 14 days ago - Pushed at: about 2 months ago - Stars: 8 - Forks: 2

THUDM/ImageReward
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
Language: Python - Size: 4.18 MB - Last synced at: 12 days ago - Pushed at: 4 months ago - Stars: 1,383 - Forks: 71

CIntellifusion/VideoDPO
Official Implementation of VideoDPO
Language: Python - Size: 20.6 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 95 - Forks: 1

NiuTrans/Vision-LLM-Alignment
This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.
Language: Python - Size: 153 MB - Last synced at: 6 days ago - Pushed at: 7 months ago - Stars: 104 - Forks: 8

krishnaura45/LMBattle
Battle between Chatbots
Language: Jupyter Notebook - Size: 29 MB - Last synced at: 19 days ago - Pushed at: 24 days ago - Stars: 2 - Forks: 0

jackaduma/ChatGLM-LoRA-RLHF-PyTorch
A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM
Language: Python - Size: 25.3 MB - Last synced at: 2 days ago - Pushed at: about 2 years ago - Stars: 135 - Forks: 10

hiyouga/ChatGLM-Efficient-Tuning 📦
Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调
Language: Python - Size: 194 MB - Last synced at: 18 days ago - Pushed at: over 1 year ago - Stars: 3,699 - Forks: 475

taco-group/Re-Align
A novel alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models.
Language: Python - Size: 18.6 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 40 - Forks: 1

janearc/wonder
metaprogramming for LLMs and other humans
Language: Python - Size: 441 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 2 - Forks: 1

hanyang1999/RainbowPO
[ICLR 2025] RainbowPO: A unified framework for combining improvements in preference optimization
Language: Python - Size: 332 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 1 - Forks: 0

dannylee1020/openpo
Building synthetic data for preference tuning
Language: Python - Size: 10.7 MB - Last synced at: 7 days ago - Pushed at: 5 months ago - Stars: 27 - Forks: 0

sumo1/gpt-reproduction-SFT-RLHF
OpenAI GPT的复现,基于Transformer。主要目标:学习GPT源码和基础原理,学习大模型监督微调SFT、基于反馈调优的大模型强化学习RLHF。代码在colab上可直接操作运行。源码学习记录:https://blog.csdn.net/xm415/category_12891845.html
Language: Jupyter Notebook - Size: 43.9 KB - Last synced at: 25 days ago - Pushed at: 26 days ago - Stars: 0 - Forks: 0

junkangwu/beta-DPO
[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$
Language: Python - Size: 43 KB - Last synced at: 5 days ago - Pushed at: 7 months ago - Stars: 43 - Forks: 2

flint-xf-fan/Federated-RLHF
[AAMAS 2025] Privacy-preserving and Personalized RLHF, with convergence guarantees. The Code contains experiments for training multiple instances of GPT-2 for personalized sentiment aligned text generation.
Language: Python - Size: 742 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 6 - Forks: 0

RLHFlow/Directional-Preference-Alignment
Directional Preference Alignment
Size: 1.83 MB - Last synced at: about 20 hours ago - Pushed at: 8 months ago - Stars: 57 - Forks: 3

sail-sg/dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
Language: Python - Size: 18.4 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 43 - Forks: 3

WangRongsheng/MedQA-ChatGLM 📦
🛰️ 基于真实医疗对话数据在ChatGLM上进行LoRA、P-Tuning V2、Freeze、RLHF等微调,我们的眼光不止于医疗问答
Language: Python - Size: 20.7 MB - Last synced at: 23 days ago - Pushed at: over 1 year ago - Stars: 322 - Forks: 49

NJUxlj/Travel-Agent-based-on-Qwen2-RLHF
A travel agent based on Qwen2.5, fine-tuned by SFT + DPO/PPO/GRPO using traveling question-answer dataset, a mindmap can be output using the response. A RAG system is build upon the tuned qwen2, using Prompt-Template + Tool-Use + Chroma embedding database + LangChain
Language: Python - Size: 155 MB - Last synced at: 29 days ago - Pushed at: 30 days ago - Stars: 9 - Forks: 1

zxuu/RLHF
LLM中相关RLHF算法实现与学习
Language: Python - Size: 1.67 MB - Last synced at: 29 days ago - Pushed at: 30 days ago - Stars: 0 - Forks: 0

liziniu/ReMax
Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)
Language: Python - Size: 1.76 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 181 - Forks: 13

FlintSH/Outlier-Tools 📦
A collection of free open-source tools to help you better understand your Outlier account, entirely handled in-browser.
Language: TypeScript - Size: 577 KB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 6 - Forks: 0

OpenLMLab/MOSS-RLHF
Secrets of RLHF in Large Language Models Part I: PPO
Language: Python - Size: 2.5 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 1,350 - Forks: 98

jackaduma/Vicuna-LoRA-RLHF-PyTorch
A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna
Language: Python - Size: 18.7 MB - Last synced at: 2 days ago - Pushed at: 12 months ago - Stars: 213 - Forks: 19

jackfsuia/nanoRLHF
RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.
Language: Python - Size: 2 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 54 - Forks: 11

wangclnlp/DeepSpeed-Chat-Extension
This repo contains some extensions of deepspeed-chat for fine-tuning LLMs (SFT+RLHF).
Language: Python - Size: 11.7 MB - Last synced at: 16 days ago - Pushed at: 10 months ago - Stars: 19 - Forks: 1

ld-ing/qdhf
Quality Diversity through Human Feedback: Towards Open-Ended Diversity-Driven Optimization (ICML 2024)
Language: Python - Size: 2.98 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 17 - Forks: 2

sylvain-wei/24-Game-Reasoning
超简单复现Deepseek-R1-Zero和Deepseek-R1,以「24点游戏」为例。通过zero-RL、SFT以及SFT+RL,以激发LLM的自主验证反思能力。 About Clean, minimal, accessible reproduction of DeepSeek R1-Zero, DeepSeek R1
Language: Python - Size: 24.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 14 - Forks: 0

sanjibnarzary/awesome-llm
Curated list of open source and openly accessible large language models
Size: 23.4 KB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 26 - Forks: 9

THUDM/WebGLM
WebGLM: An Efficient Web-enhanced Question Answering System (KDD 2023)
Language: Python - Size: 6.19 MB - Last synced at: 29 days ago - Pushed at: about 2 months ago - Stars: 1,585 - Forks: 139

rabiloo/llm-finetuning
Sample for Fine-Tuning LLMs & VLMs
Language: Python - Size: 274 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 2

Esmail-ibraheem/Axon
AI research lab🔬: implementations of AI papers and theoretical research: InstructGPT, llama, transformers, diffusion models, RLHF, etc...
Language: Python - Size: 32.7 MB - Last synced at: 2 days ago - Pushed at: about 1 month ago - Stars: 18 - Forks: 5

holarissun/RewardModelingBeyondBradleyTerry
official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and Alternatives
Language: Python - Size: 365 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 41 - Forks: 3

haoliuhl/chain-of-hindsight
Simple next-token-prediction for RLHF
Language: Python - Size: 162 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 223 - Forks: 17

mihirp1998/VADER
Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope and StableVideoDiffusion by finetuning them using various reward models such as HPS, PickScore, VideoMAE, VJEPA, YOLO, Aesthetics etc.
Language: Python - Size: 164 MB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 253 - Forks: 15

QuentinWach/image-ranker
Rank images using TrueSkill by comparing them against each other in the browser. 🖼📊
Language: HTML - Size: 9.86 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 52 - Forks: 9

THUDM/VisionReward
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation
Language: Python - Size: 9.97 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 194 - Forks: 4

ZiyiZhang27/tdpo
[ICML 2024] Code for the paper "Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases"
Language: Python - Size: 3.28 MB - Last synced at: 23 days ago - Pushed at: 10 months ago - Stars: 35 - Forks: 0

xrsrke/instructGOOSE
Implementation of Reinforcement Learning from Human Feedback (RLHF)
Language: Jupyter Notebook - Size: 3.31 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 172 - Forks: 21

LegendLeoChen/llm-finetune
使用trl、peft、transformers等库,实现对huggingface上模型的微调。
Language: Python - Size: 6.84 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

vicgalle/awesome-rlaif
A curated and updated list of relevant articles and repositories on Reinforcement Learning from AI Feedback (RLAIF)
Size: 22.5 KB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 0

AmirMotefaker/Create-your-own-ChatGPT
Create your own ChatGPT with Python
Language: Jupyter Notebook - Size: 5.86 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 16 - Forks: 8

sergio11/llm_finetuning_and_evaluation
The LLM FineTuning and Evaluation project 🚀 enhances FLAN-T5 models for tasks like summarizing Spanish news articles 🇪🇸📰. It features detailed notebooks 📚 on fine-tuning and evaluating models to optimize performance for specific applications. 🔍✨
Language: Jupyter Notebook - Size: 499 KB - Last synced at: 25 days ago - Pushed at: 2 months ago - Stars: 2 - Forks: 1

patrick-tssn/LM-Research-Hub
Language Modeling Research Hub, a comprehensive compendium for enthusiasts and scholars delving into the fascinating realm of language models (LMs), with a particular focus on large language models (LLMs)
Language: Python - Size: 5 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 18 - Forks: 3

tomekkorbak/pretraining-with-human-feedback
Code accompanying the paper Pretraining Language Models with Human Preferences
Language: Python - Size: 135 KB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 180 - Forks: 13

opening-up-chatgpt/opening-up-chatgpt.github.io
Tracking instruction-tuned LLM openness. Paper: Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “Opening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators.” In Proceedings of the 5th International Conference on Conversational User Interfaces. doi:10.1145/3571884.3604316.
Language: Python - Size: 1.51 MB - Last synced at: 28 days ago - Pushed at: 2 months ago - Stars: 117 - Forks: 7
