Topic: "reward-model"
Westlake-AI/SemiReward
[ICLR 2024] SemiReward: A General Reward Model for Semi-supervised Learning
Language: Python - Size: 1.13 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 50 - Forks: 2

itaychachy/RewardSDS
Official PyTorch Implementation for the "RewardSDS: Aligning Score Distillation via Reward-Weighted Sampling" paper!
Language: Python - Size: 22.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 8 - Forks: 0

taishan1994/Reward-Model-Finetuning
专门用于训练奖励模型的仓库。
Language: Python - Size: 15.6 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 3 - Forks: 1

techandy42/LLM_Reward_Model
Developing a LLM response ranking reward model using HFRL except it's GPT-3.5 instead of human.
Language: Jupyter Notebook - Size: 2 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

hlp-ai/miniChatGPT
Mini ChatGPT
Language: Python - Size: 317 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

BobXWu/learning-from-rewards-llm-papers
This repository collects research papers on learning from rewards in the context of post-training and test-time scaling of large language models (LLMs).
Size: 433 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

yeyimilk/CrowdVLM-R1
Proposed fuzzy reward model with GRPO to improve VLM's abilities in crowd counting task.
Language: Python - Size: 1000 Bytes - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

rochitasundar/Generative-AI-with-Large-Language-Models
This repository contains the lab work for Coursera course on "Generative AI with Large Language Models".
Language: Jupyter Notebook - Size: 218 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

RuvenGuna94/Dialogue-Summary-remove-toxic-text-PPO
Fine-tuning FLAN-T5 with PPO and PEFT to generate less toxic text summaries. This notebook leverages Meta AI's hate speech reward model and utilizes RLHF techniques for improved safety.
Language: Jupyter Notebook - Size: 12.9 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

jddunn/rlhf-nlp
POC library built on TextRL for easy training and usage of fine-tuned models using RLHF, a rewards model, and PPO
Language: Python - Size: 21.5 KB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0
