Topic: "rlhf"
hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Language: Python - Size: 44 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 46,849 - Forks: 5,727

LAION-AI/Open-Assistant
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
Language: Python - Size: 33.8 MB - Last synced at: 6 days ago - Pushed at: 8 months ago - Stars: 37,294 - Forks: 3,268

RUCAIBox/LLMSurvey
The official GitHub page for the survey paper "A Survey of Large Language Models".
Language: Python - Size: 43.1 MB - Last synced at: 2 days ago - Pushed at: about 1 month ago - Stars: 11,383 - Forks: 882

ymcui/Chinese-LLaMA-Alpaca-2
中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
Language: Python - Size: 8.15 MB - Last synced at: 2 days ago - Pushed at: 7 months ago - Stars: 7,159 - Forks: 572

InternLM/InternLM
Official release of InternLM2 7B and 20B base and chat models. 200K context support
Language: Python - Size: 4.52 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 5,395 - Forks: 383

huggingface/alignment-handbook
Robust recipes to align language models with human and AI preferences
Language: Python - Size: 255 KB - Last synced at: about 17 hours ago - Pushed at: 5 months ago - Stars: 5,134 - Forks: 442

argilla-io/argilla
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
Language: Python - Size: 772 MB - Last synced at: 10 days ago - Pushed at: 14 days ago - Stars: 4,438 - Forks: 420

opendilab/awesome-RLHF
A curated list of reinforcement learning with human feedback resources (continually updated)
Size: 471 KB - Last synced at: 11 days ago - Pushed at: 2 months ago - Stars: 3,873 - Forks: 237

hiyouga/ChatGLM-Efficient-Tuning 📦
Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调
Language: Python - Size: 194 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 3,693 - Forks: 474

Kiln-AI/Kiln
The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.
Language: Python - Size: 14.3 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 3,391 - Forks: 235

PKU-Alignment/align-anything
Align Anything: Training All-modality Model with Feedback
Language: Jupyter Notebook - Size: 108 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 3,340 - Forks: 398

Docta-ai/docta
A Doctor for your data
Language: Python - Size: 27.8 MB - Last synced at: 11 days ago - Pushed at: 3 months ago - Stars: 3,098 - Forks: 231

argilla-io/distilabel
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
Language: Python - Size: 543 MB - Last synced at: 2 days ago - Pushed at: 7 days ago - Stars: 2,640 - Forks: 193

transformerlab/transformerlab-app
Open Source Application for Advanced LLM Engineering: interact, train, fine-tune, and evaluate large language models on your own computer.
Language: TypeScript - Size: 9.37 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 1,964 - Forks: 109

tatsu-lab/alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
Language: Jupyter Notebook - Size: 302 MB - Last synced at: 8 days ago - Pushed at: 4 months ago - Stars: 1,716 - Forks: 266

THUDM/WebGLM
WebGLM: An Efficient Web-enhanced Question Answering System (KDD 2023)
Language: Python - Size: 6.19 MB - Last synced at: 8 days ago - Pushed at: 27 days ago - Stars: 1,585 - Forks: 139

PKU-Alignment/safe-rlhf
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Language: Python - Size: 3.99 MB - Last synced at: 2 days ago - Pushed at: 10 months ago - Stars: 1,448 - Forks: 120

THUDM/ImageReward
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
Language: Python - Size: 4.18 MB - Last synced at: 11 days ago - Pushed at: 3 months ago - Stars: 1,369 - Forks: 71

OpenLMLab/MOSS-RLHF
Secrets of RLHF in Large Language Models Part I: PPO
Language: Python - Size: 2.5 MB - Last synced at: 13 days ago - Pushed at: about 1 year ago - Stars: 1,350 - Forks: 98

ContextualAI/HALOs
A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).
Language: Python - Size: 5.25 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 830 - Forks: 51

xtreme1-io/xtreme1
Xtreme1 is an all-in-one data labeling and annotation platform for multimodal data training and supports 3D LiDAR point cloud, image, and LLM.
Language: TypeScript - Size: 49.8 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 743 - Forks: 117

GaryYufei/AlignLLMHumanSurvey
Aligning Large Language Models with Human: A Survey
Size: 335 KB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 727 - Forks: 32

natolambert/rlhf-book
Textbook on reinforcement learning from human feedback
Language: TeX - Size: 6.88 MB - Last synced at: about 19 hours ago - Pushed at: about 20 hours ago - Stars: 658 - Forks: 59

jerry1993-tech/Cornucopia-LLaMA-Fin-Chinese
聚宝盆(Cornucopia): 中文金融系列开源可商用大模型,并提供一套高效轻量化的垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等)
Language: Python - Size: 1.64 MB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 626 - Forks: 64

jianzhnie/LLamaTuner
Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.
Language: Python - Size: 1.02 MB - Last synced at: 2 days ago - Pushed at: 3 months ago - Stars: 600 - Forks: 63

voidful/TextRL
Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
Language: Python - Size: 400 KB - Last synced at: 12 days ago - Pushed at: 12 months ago - Stars: 556 - Forks: 59

RLHFlow/Online-RLHF
A recipe for online RLHF and online iterative DPO.
Language: Python - Size: 249 KB - Last synced at: 2 days ago - Pushed at: 4 months ago - Stars: 507 - Forks: 46

uclaml/SPPO
The official implementation of Self-Play Preference Optimization (SPPO)
Language: Python - Size: 3 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 463 - Forks: 47

mindspore-courses/step_into_llm
MindSpore online courses: Step into LLM
Language: Jupyter Notebook - Size: 246 MB - Last synced at: 2 days ago - Pushed at: 4 months ago - Stars: 457 - Forks: 111

CambioML/pykoi-rlhf-finetuned-transformers
pykoi: Active learning in one unified interface
Language: Jupyter Notebook - Size: 54.1 MB - Last synced at: 17 days ago - Pushed at: about 1 year ago - Stars: 410 - Forks: 44

Joyce94/LLM-RLHF-Tuning
LLM Tuning with PEFT (SFT+RM+PPO+DPO with LoRA)
Language: Python - Size: 22.3 MB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 409 - Forks: 17

glgh/awesome-llm-human-preference-datasets
A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.
Size: 11.7 KB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 353 - Forks: 15

sail-sg/oat
🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.
Language: Python - Size: 2.27 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 325 - Forks: 21

WangRongsheng/MedQA-ChatGLM 📦
🛰️ 基于真实医疗对话数据在ChatGLM上进行LoRA、P-Tuning V2、Freeze、RLHF等微调,我们的眼光不止于医疗问答
Language: Python - Size: 20.7 MB - Last synced at: about 24 hours ago - Pushed at: over 1 year ago - Stars: 322 - Forks: 49

TUDB-Labs/mLoRA
An Efficient "Factory" to Build Multiple LoRA Adapters
Language: Python - Size: 11 MB - Last synced at: 7 days ago - Pushed at: 2 months ago - Stars: 307 - Forks: 58

mihirp1998/VADER
Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope and StableVideoDiffusion by finetuning them using various reward models such as HPS, PickScore, VideoMAE, VJEPA, YOLO, Aesthetics etc.
Language: Python - Size: 164 MB - Last synced at: 24 days ago - Pushed at: about 1 month ago - Stars: 253 - Forks: 15

jianzhnie/Open-R1
The open source implementation of DeepSeek-R1. 开源复现 DeepSeek-R1
Language: Python - Size: 1.02 MB - Last synced at: 2 days ago - Pushed at: about 1 month ago - Stars: 253 - Forks: 47

haoliuhl/chain-of-hindsight
Simple next-token-prediction for RLHF
Language: Python - Size: 162 KB - Last synced at: 18 days ago - Pushed at: over 1 year ago - Stars: 223 - Forks: 17

jackaduma/Vicuna-LoRA-RLHF-PyTorch
A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna
Language: Python - Size: 18.7 MB - Last synced at: 8 days ago - Pushed at: 11 months ago - Stars: 213 - Forks: 19

jasonvanf/llama-trl
LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA
Language: Python - Size: 37.1 MB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 211 - Forks: 23

THUDM/VisionReward
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation
Language: Python - Size: 9.97 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 194 - Forks: 4

tomekkorbak/pretraining-with-human-feedback
Code accompanying the paper Pretraining Language Models with Human Preferences
Language: Python - Size: 135 KB - Last synced at: 21 days ago - Pushed at: about 1 year ago - Stars: 180 - Forks: 14

xrsrke/instructGOOSE
Implementation of Reinforcement Learning from Human Feedback (RLHF)
Language: Jupyter Notebook - Size: 3.31 MB - Last synced at: 12 days ago - Pushed at: about 2 years ago - Stars: 172 - Forks: 21

TideDra/VL-RLHF
A RLHF Infrastructure for Vision-Language Models
Language: Python - Size: 3.8 MB - Last synced at: 4 days ago - Pushed at: 5 months ago - Stars: 171 - Forks: 7

PKU-Alignment/aligner
[NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct
Language: Python - Size: 16.3 MB - Last synced at: 2 days ago - Pushed at: 3 months ago - Stars: 169 - Forks: 8

YangLing0818/IterComp
[ICLR 2025] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
Language: Python - Size: 32.8 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 161 - Forks: 10

mengdi-li/awesome-RLAIF
A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)
Size: 313 KB - Last synced at: 10 days ago - Pushed at: 3 months ago - Stars: 160 - Forks: 4

allenai/reward-bench
RewardBench: the first evaluation tool for reward models.
Language: Python - Size: 3.29 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 157 - Forks: 15

liziniu/ReMax
Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)
Language: Python - Size: 1.76 MB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 151 - Forks: 13

HarderThenHarder/RLLoggingBoard
A visuailzation tool to make deep understaning and easier debugging for RLHF training.
Language: Python - Size: 6.26 MB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 147 - Forks: 5

RLHFlow/RLHF-Reward-Modeling
A recipe to train reward models for RLHF.
Language: Python - Size: 143 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 143 - Forks: 7

PKU-Alignment/beavertails
BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).
Language: Makefile - Size: 2.34 MB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 134 - Forks: 6

jackaduma/ChatGLM-LoRA-RLHF-PyTorch
A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM
Language: Python - Size: 25.3 MB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 134 - Forks: 10

opening-up-chatgpt/opening-up-chatgpt.github.io
Tracking instruction-tuned LLM openness. Paper: Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “Opening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators.” In Proceedings of the 5th International Conference on Conversational User Interfaces. doi:10.1145/3571884.3604316.
Language: Python - Size: 1.51 MB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 117 - Forks: 7

l294265421/alpaca-rlhf
Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat
Language: Python - Size: 97.9 MB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 115 - Forks: 14

csmile-1006/PreferenceTransformer
Preference Transformer: Modeling Human Preferences using Transformers for RL (ICLR2023 Accepted)
Language: Python - Size: 25.6 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 107 - Forks: 13

NiuTrans/Vision-LLM-Alignment
This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.
Language: Python - Size: 153 MB - Last synced at: 15 days ago - Pushed at: 6 months ago - Stars: 104 - Forks: 8

log10-io/log10
Python client library for improving your LLM app accuracy
Language: Python - Size: 16.6 MB - Last synced at: 10 days ago - Pushed at: 2 months ago - Stars: 97 - Forks: 11

yihedeng9/rlhf-summary-notes
A brief and partial summary of RLHF algorithms.
Size: 4.02 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 96 - Forks: 2

nlp-uoregon/Okapi
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
Language: Python - Size: 262 MB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 94 - Forks: 2

Miraclemarvel55/ChatGLM-RLHF
对ChatGLM直接使用RLHF提升或降低目标输出概率|Modify ChatGLM output with only RLHF
Language: Python - Size: 932 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 87 - Forks: 13

cogment/cogment-verse
Research platform for Human-in-the-loop learning (HILL) & Multi-Agent Reinforcement Learning (MARL)
Language: Python - Size: 19.7 MB - Last synced at: 8 days ago - Pushed at: over 1 year ago - Stars: 81 - Forks: 17

jackaduma/Alpaca-LoRA-RLHF-PyTorch
A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
Language: Python - Size: 18.7 MB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 58 - Forks: 6

RLHFlow/Directional-Preference-Alignment
Directional Preference Alignment
Size: 1.83 MB - Last synced at: 4 days ago - Pushed at: 7 months ago - Stars: 57 - Forks: 3

jackfsuia/nanoRLHF
RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.
Language: Python - Size: 2 MB - Last synced at: 12 days ago - Pushed at: 2 months ago - Stars: 54 - Forks: 11

QuentinWach/image-ranker
Rank images using TrueSkill by comparing them against each other in the browser. 🖼📊
Language: HTML - Size: 9.86 MB - Last synced at: 18 days ago - Pushed at: 4 months ago - Stars: 52 - Forks: 9

sail-sg/dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
Language: Python - Size: 18.4 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 43 - Forks: 3

holarissun/RewardModelingBeyondBradleyTerry
official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and Alternatives
Language: Python - Size: 365 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 41 - Forks: 3

snu-mllab/DPPO
Official implementation of "Direct Preference-based Policy Optimization without Reward Modeling" (NeurIPS 2023)
Language: Python - Size: 26.5 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 35 - Forks: 1

ZiyiZhang27/tdpo
[ICML 2024] Code for the paper "Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases"
Language: Python - Size: 3.28 MB - Last synced at: 1 day ago - Pushed at: 9 months ago - Stars: 35 - Forks: 0

haozheji/exact-optimization
ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment
Language: Python - Size: 188 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 35 - Forks: 1

CIntellifusion/VideoDPO
Official Implementation of VideoDPO
Language: Python - Size: 20.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 34 - Forks: 0

ZhenbangDu/Reliable_AD
[ECCV2024] Towards Reliable Advertising Image Generation Using Human Feedback
Language: Python - Size: 3.92 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 34 - Forks: 0

vicgalle/zero-shot-reward-models
ZYN: Zero-Shot Reward Models with Yes-No Questions
Language: Python - Size: 1.3 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 33 - Forks: 8

dobriban/Principles-of-AI-LLMs
Materials for the course Principles of AI: LLMs at UPenn (Stat 9911, Spring 2025). LLM architectures, training paradigms (pre- and post-training, alignment), test-time computation, reasoning, safety and robustness (jailbreaking, oversight, uncertainty), representations, interpretability (circuits), etc.
Size: 157 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 30 - Forks: 0

astorfi/LLM-Alignment-Project
A comprehensive template for aligning large language models (LLMs) using Reinforcement Learning from Human Feedback (RLHF), transfer learning, and more. Build your own customizable LLM alignment solution with ease.
Language: Python - Size: 619 KB - Last synced at: 6 days ago - Pushed at: 4 months ago - Stars: 30 - Forks: 2

l294265421/my-alpaca
Reproduce alpaca
Language: Jupyter Notebook - Size: 67 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 30 - Forks: 5

wschella/llm-reliability
Code for the paper "Larger and more instructable language models become less reliable"
Language: Jupyter Notebook - Size: 477 MB - Last synced at: 9 days ago - Pushed at: 6 months ago - Stars: 29 - Forks: 0

ssbuild/chatglm_rlhf
chatglm_rlhf_finetuning
Language: Python - Size: 149 KB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 28 - Forks: 1

dannylee1020/openpo
Language: Python - Size: 10.7 MB - Last synced at: 3 days ago - Pushed at: 4 months ago - Stars: 27 - Forks: 0

ssbuild/llm_rlhf
realize the reinforcement learning training for gpt2 llama bloom and so on llm model
Language: Python - Size: 388 KB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 26 - Forks: 2

sanjibnarzary/awesome-llm
Curated list of open source and openly accessible large language models
Size: 23.4 KB - Last synced at: 6 days ago - Pushed at: almost 2 years ago - Stars: 26 - Forks: 9

halfrot/ALaRM
[ACL 2024] Code for the paper "ALaRM: Align Language Models via Hierarchical Rewards Modeling"
Language: Python - Size: 6.72 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 25 - Forks: 3

alexrame/rewardedsoups
Rewarded soups official implementation
Language: HTML - Size: 151 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 23 - Forks: 1

sahsaeedi/TPO
Triple Preference Optimization
Language: Python - Size: 19 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 21 - Forks: 0

general-preference/general-preference-model
Official implementation of paper "Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment" (https://arxiv.org/abs/2410.02197)
Language: Python - Size: 87.9 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 21 - Forks: 2

Miraclemarvel55/LLaMA-MOSS-RLHF-LoRA
用RLHF可选LoRA对LLaMA和MOSS进行训练|Training LLaMA or MOSS with RLHF [LoRA]
Language: Python - Size: 33.2 KB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 21 - Forks: 1

Esmail-ibraheem/Axon
AI research lab🔬: implementations of AI papers and theoretical research: InstructGPT, llama, transformers, diffusion models, RLHF, etc...
Language: Python - Size: 32.7 MB - Last synced at: 2 days ago - Pushed at: 19 days ago - Stars: 18 - Forks: 5

patrick-tssn/LM-Research-Hub
Language Modeling Research Hub, a comprehensive compendium for enthusiasts and scholars delving into the fascinating realm of language models (LMs), with a particular focus on large language models (LLMs)
Language: Python - Size: 5 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 18 - Forks: 3

allenai/hybrid-preferences
Learning to route instances for Human vs AI Feedback
Language: Python - Size: 273 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 18 - Forks: 2

wangclnlp/DeepSpeed-Chat-Extension
This repo contains some extensions of deepspeed-chat for fine-tuning LLMs (SFT+RLHF).
Language: Python - Size: 11.7 MB - Last synced at: 17 days ago - Pushed at: 10 months ago - Stars: 18 - Forks: 1

holarissun/Prompt-OIRL
code for paper Query-Dependent Prompt Evaluation and Optimization with Offline Inverse Reinforcement Learning
Language: Python - Size: 186 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 18 - Forks: 3

ld-ing/qdhf
Quality Diversity through Human Feedback: Towards Open-Ended Diversity-Driven Optimization (ICML 2024)
Language: Python - Size: 2.98 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 17 - Forks: 2

AmirMotefaker/Create-your-own-ChatGPT
Create your own ChatGPT with Python
Language: Jupyter Notebook - Size: 5.86 MB - Last synced at: 23 days ago - Pushed at: over 1 year ago - Stars: 16 - Forks: 8

sylvain-wei/24-Game-Reasoning
超简单复现Deepseek-R1-Zero和Deepseek-R1,以「24点游戏」为例。通过zero-RL、SFT以及SFT+RL,以激发LLM的自主验证反思能力。 About Clean, minimal, accessible reproduction of DeepSeek R1-Zero, DeepSeek R1
Language: Python - Size: 24.9 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 14 - Forks: 0

CodeName-Detective/Prompt-to-Song-Generation-using-Large-Language-Models
This project uses LLMs to generate music from text by understanding prompts, creating lyrics, determining genre, and composing melodies. It harnesses LLM capabilities to create songs based on text inputs through a multi-step approach.
Language: Jupyter Notebook - Size: 57.6 MB - Last synced at: 8 days ago - Pushed at: 11 months ago - Stars: 13 - Forks: 0

sugarandgugu/Simple-Trl-Training
基于DPO算法微调语言大模型,简单好上手。
Language: Python - Size: 43.9 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 12 - Forks: 0

vicgalle/awesome-rlaif
A curated and updated list of relevant articles and repositories on Reinforcement Learning from AI Feedback (RLAIF)
Size: 22.5 KB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 12 - Forks: 0

arunprsh/ChatGPT-Decoded-GPT2-FAQ-Bot-RLHF-PPO
A Practical Guide to Developing a Reliable FAQ Chatbot with Reinforcement Learning and Human Feedback using GPT-2 on AWS
Language: Jupyter Notebook - Size: 20.9 MB - Last synced at: 23 days ago - Pushed at: about 2 years ago - Stars: 12 - Forks: 4

YJiangcm/BMC
Code for "Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization (ICLR 2025)"
Language: Python - Size: 180 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 11 - Forks: 1
