GitHub topics: reward-models
jackaduma/Vicuna-LoRA-RLHF-PyTorch
A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna
Language: Python - Size: 18.7 MB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 214 - Forks: 19

RLHFlow/RLHF-Reward-Modeling
Recipes to train reward model for RLHF.
Language: Python - Size: 3.8 MB - Last synced at: 18 days ago - Pushed at: about 1 month ago - Stars: 1,322 - Forks: 95

jackaduma/ChatGLM-LoRA-RLHF-PyTorch
A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM
Language: Python - Size: 25.3 MB - Last synced at: 3 days ago - Pushed at: about 2 years ago - Stars: 135 - Forks: 10

csmile-1006/REDS_agent
Subtask-Aware Visual Reward Learning from Segmented Demonstrations (ICLR 2025 accepted)
Language: Python - Size: 0 Bytes - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

holarissun/RewardModelingBeyondBradleyTerry
official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and Alternatives
Language: Python - Size: 365 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 41 - Forks: 3

MJ-Bench/MJ-Bench
Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"
Language: Jupyter Notebook - Size: 218 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 43 - Forks: 5

tlc4418/llm_optimization
A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.
Language: Python - Size: 62.5 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 34 - Forks: 3

ExplainableML/ReNO
[NeurIPS 2024] ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization
Language: Python - Size: 7 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 113 - Forks: 9

jackaduma/Alpaca-LoRA-RLHF-PyTorch
A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
Language: Python - Size: 18.7 MB - Last synced at: 3 days ago - Pushed at: about 2 years ago - Stars: 58 - Forks: 6

BillChan226/MJ-Bench
Official implementation for "MJ-BENCH: Is Your Multimodal Reward Model Really a Good Judge?"
Language: Jupyter Notebook - Size: 2.56 GB - Last synced at: 2 months ago - Pushed at: 12 months ago - Stars: 8 - Forks: 0

genrm-star/genrm-critiques
GenRM-CoT: Data release for verification rationales
Size: 426 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 4 - Forks: 0

chrisliu298/Skywork-Reward Fork of SkyworkAI/Skywork-Reward
Rank 1 and 3 reward models on RewardBench
Size: 551 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

vicgalle/zero-shot-reward-models
ZYN: Zero-Shot Reward Models with Yes-No Questions
Language: Python - Size: 1.3 MB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 33 - Forks: 8
