GitHub topics: rlhf

Repositories

hiyouga/LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Language: Python - Size: 46.4 MB - Last synced at: about 9 hours ago - Pushed at: 1 day ago - Stars: 48,701 - Forks: 5,926

opendilab/awesome-RLHF

A curated list of reinforcement learning with human feedback resources (continually updated)

Size: 472 KB - Last synced at: about 20 hours ago - Pushed at: 13 days ago - Stars: 3,930 - Forks: 239

InternLM/InternLM

Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).

Language: Python - Size: 7.12 MB - Last synced at: about 22 hours ago - Pushed at: 3 months ago - Stars: 6,893 - Forks: 483

A framework to analyze how AGI/ASI might emerge from decentralized, adaptive systems, rather than as the fruit of a single model deployment. It also aims to present orientation as a dynamic and self-evolving Magna Carta, helping to guide the emergence of such phenomena.

Size: 227 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 5 - Forks: 1

huggingface/alignment-handbook

Robust recipes to align language models with human and AI preferences

Language: Python - Size: 306 KB - Last synced at: about 14 hours ago - Pushed at: 13 days ago - Stars: 5,172 - Forks: 442

voidful/TextRL

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)

Language: Python - Size: 400 KB - Last synced at: about 19 hours ago - Pushed at: about 1 year ago - Stars: 557 - Forks: 59

RUCAIBox/LLMSurvey

The official GitHub page for the survey paper "A Survey of Large Language Models".

Language: Python - Size: 43.1 MB - Last synced at: 2 days ago - Pushed at: 2 months ago - Stars: 11,465 - Forks: 885

GaryYufei/AlignLLMHumanSurvey

Aligning Large Language Models with Human: A Survey

Size: 335 KB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 730 - Forks: 31

janelu9/EasyLLM

Running Large Language Model easily.

Language: Python - Size: 230 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 8 - Forks: 0

RLHFlow/Online-RLHF

A recipe for online RLHF and online iterative DPO.

Language: Python - Size: 249 KB - Last synced at: 2 days ago - Pushed at: 5 months ago - Stars: 511 - Forks: 48

TUDB-Labs/mLoRA

An Efficient "Factory" to Build Multiple LoRA Adapters

Language: Python - Size: 11 MB - Last synced at: 2 days ago - Pushed at: 3 months ago - Stars: 313 - Forks: 58

SOMIR420/transformerlab-app

Open Source Application for Advanced LLM Engineering: interact, train, fine-tune, and evaluate large language models on your own computer.

Language: TypeScript - Size: 8.4 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

PKU-Alignment/safe-rlhf

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

Language: Python - Size: 3.99 MB - Last synced at: 3 days ago - Pushed at: 11 months ago - Stars: 1,460 - Forks: 120

kavya4411/tune

Flutter Piano is a simple and educational music application that allows users to play black and white piano keys that produce realistic sounds upon tapping. It is built with Flutter and designed with a clean, intuitive interface that offers an authentic piano playing experience.

Language: C++ - Size: 2.82 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 1

ymcui/Chinese-LLaMA-Alpaca-2

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)

Language: Python - Size: 8.15 MB - Last synced at: 2 days ago - Pushed at: 8 months ago - Stars: 7,155 - Forks: 570

transformerlab/transformerlab-app

Open Source Application for Advanced LLM Engineering: interact, train, fine-tune, and evaluate large language models on your own computer.

Language: TypeScript - Size: 9.14 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 3,074 - Forks: 239

Kiln-AI/Kiln

The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.

Language: Python - Size: 14.5 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 3,442 - Forks: 237

mindspore-courses/step_into_llm

MindSpore online courses: Step into LLM

Language: Jupyter Notebook - Size: 246 MB - Last synced at: 2 days ago - Pushed at: 4 months ago - Stars: 464 - Forks: 111

jianzhnie/Open-R1

The open source implementation of DeepSeek-R1. 开源复现 DeepSeek-R1

Language: Python - Size: 1.02 MB - Last synced at: 2 days ago - Pushed at: 2 months ago - Stars: 255 - Forks: 49

PKU-Alignment/align-anything

Align Anything: Training All-modality Model with Feedback

Language: Jupyter Notebook - Size: 108 MB - Last synced at: 6 days ago - Pushed at: 12 days ago - Stars: 3,601 - Forks: 418

ronniross/symbiotic-core-library

Toolkits, instructions, prompts, bibliographies, and research support designed to enhance/test LLM metacognitive/contextual awareness, address deficiencies, and unlock emergent properties/human-AI symbiosis.

Size: 8 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 6 - Forks: 0

LAION-AI/Open-Assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

Language: Python - Size: 33.8 MB - Last synced at: 6 days ago - Pushed at: 9 months ago - Stars: 37,343 - Forks: 3,271

modelscope/Trinity-RFT

Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (LLM).

Language: Python - Size: 11.7 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 73 - Forks: 10

RLHFlow/RLHF-Reward-Modeling

Recipes to train reward model for RLHF.

Language: Python - Size: 3.8 MB - Last synced at: 5 days ago - Pushed at: 19 days ago - Stars: 1,322 - Forks: 95

Joyce94/LLM-RLHF-Tuning

LLM Tuning with PEFT (SFT+RM+PPO+DPO with LoRA)

Language: Python - Size: 22.3 MB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 414 - Forks: 17

TideDra/VL-RLHF

A RLHF Infrastructure for Vision-Language Models

Language: Python - Size: 3.8 MB - Last synced at: about 14 hours ago - Pushed at: 6 months ago - Stars: 174 - Forks: 7

argilla-io/argilla

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets

Language: Python - Size: 772 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 4,482 - Forks: 431

CodeName-Detective/Prompt-to-Song-Generation-using-Large-Language-Models

This project uses LLMs to generate music from text by understanding prompts, creating lyrics, determining genre, and composing melodies. It harnesses LLM capabilities to create songs based on text inputs through a multi-step approach.

Language: Jupyter Notebook - Size: 57.6 MB - Last synced at: 2 days ago - Pushed at: 12 months ago - Stars: 15 - Forks: 0

allenai/reward-bench

RewardBench: the first evaluation tool for reward models.

Language: Python - Size: 25.5 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 562 - Forks: 66

glgh/awesome-llm-human-preference-datasets

A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.

Size: 11.7 KB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 356 - Forks: 15

jasonvanf/llama-trl

LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA

Language: Python - Size: 37.1 MB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 213 - Forks: 23

natolambert/rlhf-book

Textbook on reinforcement learning from human feedback

Language: TeX - Size: 6.91 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 853 - Forks: 74

fereydoonboroojerdi/multimodal-customer-insights-generator

Scalable multimodal AI system combining FSDP, RLHF, and Inferentia optimization for customer insights generation.

Language: Python - Size: 212 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

tatsu-lab/alpaca_eval

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

Language: Jupyter Notebook - Size: 302 MB - Last synced at: 8 days ago - Pushed at: 5 months ago - Stars: 1,737 - Forks: 267

gradient-divergence/agentic-retail-foundations

Source code for the Foundations of Agentic AI for Retail Book

Language: Python - Size: 2.08 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

argilla-io/distilabel

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Language: Python - Size: 543 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 2,671 - Forks: 198

fereydoonboroojerdi/multilingual-llm-trainium

Production-ready multilingual customer support system using LLaMA-3, RLHF, DeepSpeed, and AWS Trainium.

Language: Python - Size: 197 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

mengdi-li/awesome-RLAIF

A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)

Size: 313 KB - Last synced at: 8 days ago - Pushed at: 4 months ago - Stars: 163 - Forks: 4

general-preference/general-preference-model

Official implementation of ICML 2025 paper "Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment" (https://arxiv.org/abs/2410.02197)

Language: Python - Size: 90.8 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 23 - Forks: 3

kosaokis/LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Language: Python - Size: 40.5 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

jerry1993-tech/Cornucopia-LLaMA-Fin-Chinese

聚宝盆(Cornucopia): 中文金融系列开源可商用大模型，并提供一套高效轻量化的垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等)

Language: Python - Size: 1.64 MB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 631 - Forks: 64

log10-io/log10js 📦

JavaScript client library for managing your LLM data in one place

Language: JavaScript - Size: 20.5 KB - Last synced at: 4 days ago - Pushed at: about 2 years ago - Stars: 11 - Forks: 0

log10-io/log10 📦

Python client library for improving your LLM app accuracy

Language: Python - Size: 16.6 MB - Last synced at: about 21 hours ago - Pushed at: 3 months ago - Stars: 98 - Forks: 11

sail-sg/oat

🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.

Language: Python - Size: 2.29 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 338 - Forks: 23

PKU-Alignment/aligner

[NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct

Language: Python - Size: 16.3 MB - Last synced at: 6 days ago - Pushed at: 4 months ago - Stars: 170 - Forks: 8

jianzhnie/LLamaTuner

Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

Language: Python - Size: 1.02 MB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 602 - Forks: 65

l294265421/alpaca-rlhf

Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat

Language: Python - Size: 97.9 MB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 115 - Forks: 14

dobriban/Principles-of-AI-LLMs

Materials for the course Principles of AI: LLMs at UPenn (Stat 9911, Spring 2025). LLM architectures, training paradigms (pre- and post-training, alignment), test-time computation, reasoning, safety and robustness (jailbreaking, oversight, uncertainty), representations, interpretability (circuits), etc.

Size: 188 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 31 - Forks: 2

HarderThenHarder/RLLoggingBoard

A visuailzation tool to make deep understaning and easier debugging for RLHF training.

Language: Python - Size: 6.32 MB - Last synced at: 5 days ago - Pushed at: 3 months ago - Stars: 188 - Forks: 6

nlp-uoregon/Okapi

Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback

Language: Python - Size: 262 MB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 95 - Forks: 2

ContextualAI/HALOs

A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

Language: Python - Size: 5.28 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 835 - Forks: 51

astorfi/LLM-Alignment-Project

A comprehensive template for aligning large language models (LLMs) using Reinforcement Learning from Human Feedback (RLHF), transfer learning, and more. Build your own customizable LLM alignment solution with ease.

Language: Python - Size: 619 KB - Last synced at: 2 days ago - Pushed at: 5 months ago - Stars: 31 - Forks: 2

PKU-Alignment/beavertails

BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).

Language: Makefile - Size: 2.34 MB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 136 - Forks: 6

Docta-ai/docta

A Doctor for your data

Language: Python - Size: 27.8 MB - Last synced at: 15 days ago - Pushed at: 4 months ago - Stars: 3,235 - Forks: 232

li-plus/flash-preference

Accelerate LLM preference tuning via prefix sharing with a single line of code

Language: Python - Size: 3.35 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

wang8740/MAP

Documentation at

Language: Python - Size: 6.87 MB - Last synced at: 14 days ago - Pushed at: about 2 months ago - Stars: 8 - Forks: 2

THUDM/ImageReward

[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation

Language: Python - Size: 4.18 MB - Last synced at: 12 days ago - Pushed at: 4 months ago - Stars: 1,383 - Forks: 71

CIntellifusion/VideoDPO

Official Implementation of VideoDPO

Language: Python - Size: 20.6 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 95 - Forks: 1

NiuTrans/Vision-LLM-Alignment

This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.

Language: Python - Size: 153 MB - Last synced at: 6 days ago - Pushed at: 7 months ago - Stars: 104 - Forks: 8

krishnaura45/LMBattle

Battle between Chatbots

Language: Jupyter Notebook - Size: 29 MB - Last synced at: 19 days ago - Pushed at: 24 days ago - Stars: 2 - Forks: 0

jackaduma/ChatGLM-LoRA-RLHF-PyTorch

A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM

Language: Python - Size: 25.3 MB - Last synced at: 2 days ago - Pushed at: about 2 years ago - Stars: 135 - Forks: 10

hiyouga/ChatGLM-Efficient-Tuning 📦

Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调

Language: Python - Size: 194 MB - Last synced at: 18 days ago - Pushed at: over 1 year ago - Stars: 3,699 - Forks: 475

taco-group/Re-Align

A novel alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models.

Language: Python - Size: 18.6 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 40 - Forks: 1

janearc/wonder

metaprogramming for LLMs and other humans

Language: Python - Size: 441 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 2 - Forks: 1

hanyang1999/RainbowPO

[ICLR 2025] RainbowPO: A unified framework for combining improvements in preference optimization

Language: Python - Size: 332 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 1 - Forks: 0

dannylee1020/openpo

Building synthetic data for preference tuning

Language: Python - Size: 10.7 MB - Last synced at: 7 days ago - Pushed at: 5 months ago - Stars: 27 - Forks: 0

sumo1/gpt-reproduction-SFT-RLHF

OpenAI GPT的复现，基于Transformer。主要目标：学习GPT源码和基础原理，学习大模型监督微调SFT、基于反馈调优的大模型强化学习RLHF。代码在colab上可直接操作运行。源码学习记录：https://blog.csdn.net/xm415/category_12891845.html

Language: Jupyter Notebook - Size: 43.9 KB - Last synced at: 25 days ago - Pushed at: 26 days ago - Stars: 0 - Forks: 0

junkangwu/beta-DPO

[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$

Language: Python - Size: 43 KB - Last synced at: 5 days ago - Pushed at: 7 months ago - Stars: 43 - Forks: 2

flint-xf-fan/Federated-RLHF

[AAMAS 2025] Privacy-preserving and Personalized RLHF, with convergence guarantees. The Code contains experiments for training multiple instances of GPT-2 for personalized sentiment aligned text generation.

Language: Python - Size: 742 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 6 - Forks: 0

RLHFlow/Directional-Preference-Alignment

Directional Preference Alignment

Size: 1.83 MB - Last synced at: about 20 hours ago - Pushed at: 8 months ago - Stars: 57 - Forks: 3

sail-sg/dice

Official implementation of Bootstrapping Language Models via DPO Implicit Rewards

Language: Python - Size: 18.4 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 43 - Forks: 3

WangRongsheng/MedQA-ChatGLM 📦

🛰️ 基于真实医疗对话数据在ChatGLM上进行LoRA、P-Tuning V2、Freeze、RLHF等微调，我们的眼光不止于医疗问答

Language: Python - Size: 20.7 MB - Last synced at: 23 days ago - Pushed at: over 1 year ago - Stars: 322 - Forks: 49

NJUxlj/Travel-Agent-based-on-Qwen2-RLHF

A travel agent based on Qwen2.5, fine-tuned by SFT + DPO/PPO/GRPO using traveling question-answer dataset, a mindmap can be output using the response. A RAG system is build upon the tuned qwen2, using Prompt-Template + Tool-Use + Chroma embedding database + LangChain

Language: Python - Size: 155 MB - Last synced at: 29 days ago - Pushed at: 30 days ago - Stars: 9 - Forks: 1

zxuu/RLHF

LLM中相关RLHF算法实现与学习

Language: Python - Size: 1.67 MB - Last synced at: 29 days ago - Pushed at: 30 days ago - Stars: 0 - Forks: 0

liziniu/ReMax

Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)

Language: Python - Size: 1.76 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 181 - Forks: 13

FlintSH/Outlier-Tools 📦

A collection of free open-source tools to help you better understand your Outlier account, entirely handled in-browser.

Language: TypeScript - Size: 577 KB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 6 - Forks: 0

OpenLMLab/MOSS-RLHF

Secrets of RLHF in Large Language Models Part I: PPO

Language: Python - Size: 2.5 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 1,350 - Forks: 98

jackaduma/Vicuna-LoRA-RLHF-PyTorch

A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna

Language: Python - Size: 18.7 MB - Last synced at: 2 days ago - Pushed at: 12 months ago - Stars: 213 - Forks: 19

jackfsuia/nanoRLHF

RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.

Language: Python - Size: 2 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 54 - Forks: 11

wangclnlp/DeepSpeed-Chat-Extension

This repo contains some extensions of deepspeed-chat for fine-tuning LLMs (SFT+RLHF).

Language: Python - Size: 11.7 MB - Last synced at: 16 days ago - Pushed at: 10 months ago - Stars: 19 - Forks: 1

ld-ing/qdhf

Quality Diversity through Human Feedback: Towards Open-Ended Diversity-Driven Optimization (ICML 2024)

Language: Python - Size: 2.98 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 17 - Forks: 2

sylvain-wei/24-Game-Reasoning

超简单复现Deepseek-R1-Zero和Deepseek-R1，以「24点游戏」为例。通过zero-RL、SFT以及SFT+RL，以激发LLM的自主验证反思能力。 About Clean, minimal, accessible reproduction of DeepSeek R1-Zero, DeepSeek R1

Language: Python - Size: 24.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 14 - Forks: 0

sanjibnarzary/awesome-llm

Curated list of open source and openly accessible large language models

Size: 23.4 KB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 26 - Forks: 9

THUDM/WebGLM

WebGLM: An Efficient Web-enhanced Question Answering System (KDD 2023)

Language: Python - Size: 6.19 MB - Last synced at: 29 days ago - Pushed at: about 2 months ago - Stars: 1,585 - Forks: 139

rabiloo/llm-finetuning

Sample for Fine-Tuning LLMs & VLMs

Language: Python - Size: 274 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 2

Esmail-ibraheem/Axon

AI research lab🔬: implementations of AI papers and theoretical research: InstructGPT, llama, transformers, diffusion models, RLHF, etc...

Language: Python - Size: 32.7 MB - Last synced at: 2 days ago - Pushed at: about 1 month ago - Stars: 18 - Forks: 5

holarissun/RewardModelingBeyondBradleyTerry

official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and Alternatives

Language: Python - Size: 365 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 41 - Forks: 3

haoliuhl/chain-of-hindsight

Simple next-token-prediction for RLHF

Language: Python - Size: 162 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 223 - Forks: 17

mihirp1998/VADER

Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope and StableVideoDiffusion by finetuning them using various reward models such as HPS, PickScore, VideoMAE, VJEPA, YOLO, Aesthetics etc.

Language: Python - Size: 164 MB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 253 - Forks: 15

QuentinWach/image-ranker

Rank images using TrueSkill by comparing them against each other in the browser. 🖼📊

Language: HTML - Size: 9.86 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 52 - Forks: 9

THUDM/VisionReward

VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation

Language: Python - Size: 9.97 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 194 - Forks: 4

ZiyiZhang27/tdpo

[ICML 2024] Code for the paper "Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases"

Language: Python - Size: 3.28 MB - Last synced at: 23 days ago - Pushed at: 10 months ago - Stars: 35 - Forks: 0

xrsrke/instructGOOSE

Implementation of Reinforcement Learning from Human Feedback (RLHF)

Language: Jupyter Notebook - Size: 3.31 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 172 - Forks: 21

LegendLeoChen/llm-finetune

使用trl、peft、transformers等库，实现对huggingface上模型的微调。

Language: Python - Size: 6.84 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

vicgalle/awesome-rlaif

A curated and updated list of relevant articles and repositories on Reinforcement Learning from AI Feedback (RLAIF)

Size: 22.5 KB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 0

AmirMotefaker/Create-your-own-ChatGPT

Create your own ChatGPT with Python

Language: Jupyter Notebook - Size: 5.86 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 16 - Forks: 8

sergio11/llm_finetuning_and_evaluation

The LLM FineTuning and Evaluation project 🚀 enhances FLAN-T5 models for tasks like summarizing Spanish news articles 🇪🇸📰. It features detailed notebooks 📚 on fine-tuning and evaluating models to optimize performance for specific applications. 🔍✨

Language: Jupyter Notebook - Size: 499 KB - Last synced at: 25 days ago - Pushed at: 2 months ago - Stars: 2 - Forks: 1

patrick-tssn/LM-Research-Hub

Language Modeling Research Hub, a comprehensive compendium for enthusiasts and scholars delving into the fascinating realm of language models (LMs), with a particular focus on large language models (LLMs)

Language: Python - Size: 5 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 18 - Forks: 3

tomekkorbak/pretraining-with-human-feedback

Code accompanying the paper Pretraining Language Models with Human Preferences

Language: Python - Size: 135 KB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 180 - Forks: 13

opening-up-chatgpt/opening-up-chatgpt.github.io

Tracking instruction-tuned LLM openness. Paper: Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “Opening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators.” In Proceedings of the 5th International Conference on Conversational User Interfaces. doi:10.1145/3571884.3604316.

Language: Python - Size: 1.51 MB - Last synced at: 28 days ago - Pushed at: 2 months ago - Stars: 117 - Forks: 7

Related Keywords

rlhf 206 llm 75 alignment 38 large-language-models 37 llama 33 reinforcement-learning 32 lora 29 chatgpt 27 dpo 26 ppo 26 llms 24 fine-tuning 21 pytorch 19 ai 19 nlp 18 machine-learning 17 language-model 16 transformers 16 gpt 14 peft 14 deep-learning 12 natural-language-processing 11 generative-ai 10 instruction-tuning 10 llama3 10 qwen 9 llama2 9 qlora 9 chatglm 9 rl 9 human-feedback 8 python 8 rlaif 8 alpaca 8 prompt-engineering 7 ai-safety 7 huggingface 7 sft 7 chatbot 7 reward 7 openai 6 deepspeed 6 grpo 6 reward-models 6 artificial-intelligence 6 finetuning 6 agent 6 ml 5 transformer 5 flan-t5 5 preference-learning 5 finetune 5 diffusion-models 5 ai-alignment 5 rag 5 stable-diffusion 5 datasets 5 vicuna 4 llm-training 4 safety 4 preference-alignment 4 deep-reinforcement-learning 4 trl 4 langchain 4 diffusion 4 gpt-4 4 gpt2 4 evaluation 4 text-generation 4 reinforcement-learning-from-human-feedback 4 synthetic-data 4 safe-rlhf 3 aws 3 reward-modeling 3 chinese 3 vision-language-model 3 sagemaker 3 bloom 3 interpretability 3 preference-optimization 3 large-language-model 3 prompt 3 dataset 3 instructgpt 3 pretraining 3 mllm 3 chain-of-thought 3 gpt-2 3 text-to-image 3 vlm 3 moe 3 inference 3 post-training 3 chatglm-6b 3 autonomous-agents 2 annotation-tool 2 long-context 2 visualization 2 llava 2 llama-factory 2