GitHub topics: llm-aligment

Repositories

sail-sg/oat

🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.

Language: Python - Size: 2.29 MB - Last synced at: about 22 hours ago - Pushed at: about 23 hours ago - Stars: 338 - Forks: 23

ZFancy/awesome-activation-engineering

A curated list of resources for activation engineering

Size: 154 KB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 56 - Forks: 1

Dicklesworthstone/some_thoughts_on_ai_alignment

Some Thoughts on AI Alignment: Using AI to Control AI

Size: 1.08 MB - Last synced at: 4 days ago - Pushed at: 2 months ago - Stars: 7 - Forks: 0

holarissun/RewardModelingBeyondBradleyTerry

official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and Alternatives

Language: Python - Size: 365 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 41 - Forks: 3

Zanette-Labs/SpeculativeRejection

[NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection

Language: Python - Size: 2.24 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

Related Keywords

llm-aligment 5 alignment 2 large-language-models 2 llm 2 rlhf 2 distributed-rl 1 control 1 interpretability 1 transparent 1 ai 1 ai-alignment 1 llm-safety 1 inverse-reinforcement-learning 1 largelanguagemodels 1 llmalignment 1 reward 1 reward-modeling 1 reward-models 1 acceleration 1 best-of-n 1 inference-scaling 1 distributed-training 1 dpo 1 dueling-bandits 1 grpo 1 llm-exploration 1 online-alignment 1 online-rl 1 ppo 1 r1-zero 1 reasoning 1 thompson-sampling 1 activation-engineering 1 ai-safety 1 concept 1 concept-activation-vector 1 concept-rep 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos