GitHub topics: reward-models

Repositories

jackaduma/Vicuna-LoRA-RLHF-PyTorch

A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna

Language: Python - Size: 18.7 MB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 214 - Forks: 19

RLHFlow/RLHF-Reward-Modeling

Recipes to train reward model for RLHF.

Language: Python - Size: 3.8 MB - Last synced at: 18 days ago - Pushed at: about 1 month ago - Stars: 1,322 - Forks: 95

jackaduma/ChatGLM-LoRA-RLHF-PyTorch

A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM

Language: Python - Size: 25.3 MB - Last synced at: 3 days ago - Pushed at: about 2 years ago - Stars: 135 - Forks: 10

csmile-1006/REDS_agent

Subtask-Aware Visual Reward Learning from Segmented Demonstrations (ICLR 2025 accepted)

Language: Python - Size: 0 Bytes - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

holarissun/RewardModelingBeyondBradleyTerry

official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and Alternatives

Language: Python - Size: 365 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 41 - Forks: 3

MJ-Bench/MJ-Bench

Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"

Language: Jupyter Notebook - Size: 218 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 43 - Forks: 5

tlc4418/llm_optimization

A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.

Language: Python - Size: 62.5 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 34 - Forks: 3

ExplainableML/ReNO

[NeurIPS 2024] ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization

Language: Python - Size: 7 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 113 - Forks: 9

jackaduma/Alpaca-LoRA-RLHF-PyTorch

A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca

Language: Python - Size: 18.7 MB - Last synced at: 3 days ago - Pushed at: about 2 years ago - Stars: 58 - Forks: 6