An open API service providing repository metadata for many open source software ecosystems.

Topic: "reward-model"

Westlake-AI/SemiReward

[ICLR 2024] SemiReward: A General Reward Model for Semi-supervised Learning

Language: Python - Size: 1.13 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 50 - Forks: 2

itaychachy/RewardSDS

Official PyTorch Implementation for the "RewardSDS: Aligning Score Distillation via Reward-Weighted Sampling" paper!

Language: Python - Size: 22.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 8 - Forks: 0

taishan1994/Reward-Model-Finetuning

专门用于训练奖励模型的仓库。

Language: Python - Size: 15.6 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 3 - Forks: 1

techandy42/LLM_Reward_Model

Developing a LLM response ranking reward model using HFRL except it's GPT-3.5 instead of human.

Language: Jupyter Notebook - Size: 2 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

hlp-ai/miniChatGPT

Mini ChatGPT

Language: Python - Size: 317 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

BobXWu/learning-from-rewards-llm-papers

This repository collects research papers on learning from rewards in the context of post-training and test-time scaling of large language models (LLMs).

Size: 433 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

yeyimilk/CrowdVLM-R1

Proposed fuzzy reward model with GRPO to improve VLM's abilities in crowd counting task.

Language: Python - Size: 1000 Bytes - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

rochitasundar/Generative-AI-with-Large-Language-Models

This repository contains the lab work for Coursera course on "Generative AI with Large Language Models".

Language: Jupyter Notebook - Size: 218 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

RuvenGuna94/Dialogue-Summary-remove-toxic-text-PPO

Fine-tuning FLAN-T5 with PPO and PEFT to generate less toxic text summaries. This notebook leverages Meta AI's hate speech reward model and utilizes RLHF techniques for improved safety.

Language: Jupyter Notebook - Size: 12.9 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

jddunn/rlhf-nlp

POC library built on TextRL for easy training and usage of fine-tuned models using RLHF, a rewards model, and PPO

Language: Python - Size: 21.5 KB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0