GitHub topics: preference-learning
haoxian-chen/MallowsPO
[ICLR 2025] MallowsPO: Fine-Tune Your LLM with Preference Dispersions
Language: Python - Size: 37.1 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 3 - Forks: 1

tournesol-app/tournesol
Free and open source code of the https://tournesol.app platform. Meet the community on Discord https://discord.gg/WvcSG55Bf3
Language: Python - Size: 29.1 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 358 - Forks: 51

IAAR-Shanghai/ICSFSurvey
Explore concepts like Self-Correct, Self-Refine, Self-Improve, Self-Contradict, Self-Play, and Self-Knowledge, alongside o1-like reasoning elevation🍓 and hallucination alleviation🍄.
Language: Jupyter Notebook - Size: 5.02 MB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 169 - Forks: 4

fatemehpesaran310/lpoi
Official PyTorch implementation of "LPOI: Listwise Preference Optimization for Vision Language Models" (ACL 2025 Main)
Language: Python - Size: 642 KB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 7 - Forks: 0

raunakkr1234/article-to-jsonl
Lightweight desktop app to collect articles and opinions, summarizing content with OpenAI. Ideal for personal journaling and fine-tuning data. 📝✨
Language: Python - Size: 11.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

TianJieHeng/article-to-jsonl
Desktop app that scrapes an article, autogenerates a GPT summary, lets me rate & respond, and saves each entry to JSONL for LLM fine‑tuning / preference training.
Language: Python - Size: 8.79 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

liushunyu/awesome-direct-preference-optimization
A Survey of Direct Preference Optimization (DPO)
Size: 3.12 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 43 - Forks: 0

wassname/open_pref_eval
Hackable, simple, llm evals on preference datasets
Language: Python - Size: 15.8 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

allenai/reward-bench
RewardBench: the first evaluation tool for reward models.
Language: Python - Size: 26.3 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 597 - Forks: 83

Wang-Xiaodong1899/LeanPO
The official repo for "LeanPO: Lean Preference Optimization for Likelihood Alignment in Video-LLMs"
Size: 0 Bytes - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

ethanvillalovoz/clarifybot
An interactive system that uses large language models to generate clarification questions for ambiguous human feedback, improving reward learning accuracy.
Language: Jupyter Notebook - Size: 283 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Dev1nW/Simplified-Rating-and-Preference-RL
Simplified, modern implementation of Rating and Preference-based Reinforcement Learning.
Language: Python - Size: 21.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

SMARTlab-Purdue/SAN-NaviSTAR
This repository contains the source code for our paper: "NaviSTAR: Socially Aware Robot Navigation with Hybrid Spatio-Temporal Graph Transformer and Preference Learning". For more details, please refer to our project website at https://sites.google.com/view/san-navistar.
Language: Python - Size: 125 MB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 56 - Forks: 5

osorensen/BayesMallowsSMC2
Sequential Monte Carlo algorithms for the Bayesian Mallows model.
Language: C++ - Size: 509 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

li-plus/flash-preference
Accelerate LLM preference tuning via prefix sharing with a single line of code
Language: Python - Size: 269 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 41 - Forks: 0

sail-sg/dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
Language: Python - Size: 18.4 MB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 44 - Forks: 3

PeymanMorteza/Metric-Preference-Learning-RKHS
Metric and Preference Learning in Reproducing Kernel Hilbert Spaces
Language: Python - Size: 4.45 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

zwhong714/weak-to-strong-preference-optimization
[ICLR 2025 Spotlight] Weak-to-strong preference optimization: stealing reward from weak aligned model
Language: Python - Size: 1.41 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

qxcv/magical
The MAGICAL benchmark suite for robust imitation learning (NeurIPS 2020)
Language: Python - Size: 52.3 MB - Last synced at: 24 days ago - Pushed at: almost 2 years ago - Stars: 77 - Forks: 11

SMARTlab-Purdue/SAN-FAPL
This repository contains the source code for our paper: "Feedback-efficient Active Preference Learning for Socially Aware Robot Navigation", accepted to IROS-2022. For more details, please refer to our project website at https://sites.google.com/view/san-fapl.
Language: Python - Size: 26.6 MB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 8 - Forks: 4

martimfasantos/CustomPOs-for-SLMs
Novel Preference Optimization Algorithms for state-of-the-art small LMs, enhancing performance in GenAI and NLP tasks
Language: Python - Size: 272 KB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

Seninfarheen/Senior-Sage
A conversational assistant designed to support elderly individuals with reminders, health questions, and personalized preferences using advanced LLM capabilities.
Language: Python - Size: 223 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

typoverflow/WiseRL
PyTorch implementations for Offline Preference-Based RL (PbRL) algorithms
Language: Python - Size: 6.06 MB - Last synced at: 5 months ago - Pushed at: 6 months ago - Stars: 19 - Forks: 2

gao-g/prelude
Code for the paper "Aligning LLM Agents by Learning Latent Preference from User Edits".
Language: Python - Size: 43 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 27 - Forks: 0

DanieleF198/ILASP-as-post-hoc-method-in-a-preference-system
Project about experiments of the use of ILASP as a post-hoc method over black-box models, in which we also study and approach technical issues like exponential time execution.
Language: Lasso - Size: 370 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

lasgroup/MaxMinLCB
Code for our paper "Bandits with Preference Feedback: A Stackelberg Game Perspective"
Language: Python - Size: 45.9 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

JanoschMenke/metis
Python-based GUI to collect Feedback of Chemist in Molecules
Language: Python - Size: 100 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 34 - Forks: 10

LemurPwned/bradley-terry-ui
UI for straightforward Bradley-Terry feedback loop
Language: Python - Size: 5.86 KB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

vicgalle/configurable-safety-tuning
Data and models for the paper "Configurable Safety Tuning of Language Models with Synthetic Preference Data"
Language: Python - Size: 2.53 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 1

ma921/CoExBO
(AISTATS 2024) "Looping in the Human: Collaborative and Explainable Bayesian Optimization"
Language: Jupyter Notebook - Size: 4.97 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

aleksa-sukovic/iclr2024-reward-design-for-justifiable-rl
Code for the paper "Reward Design for Justifiable Sequential Decision-Making"; ICLR 2024
Language: Jupyter Notebook - Size: 2.2 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

BARUDA-AI/Awesome-Preference-Optimization
Survey of preference alignment algorithms
Size: 0 Bytes - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

julilien/PLDepth
Code for "Monocular Depth Estimation via Listwise Ranking using the Plackett-Luce Model" as published at CVPR 2021.
Language: Python - Size: 503 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 2

huixin-zhan-ai/GAN-Assisted-Preference-Based-Learning
A paper under AAAI-20 review
Language: Python - Size: 206 KB - Last synced at: 5 months ago - Pushed at: about 6 years ago - Stars: 6 - Forks: 1

Intelligent-Systems-Group/jpl-framework
Java framework for Preference Learning
Language: Java - Size: 9.6 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 6 - Forks: 2

Rahgooy/MDFT
In this project, we design a recurrent neural network to simulate a cognitive model of decision-making called Multi Alternative Decision Field Theory (MDFT). We train this RNN to learn the parameters of MDFT.
Language: Python - Size: 3.74 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 2 - Forks: 0

FareedKhan-dev/APReL-Mountain-Car-Reinforcement-Learning
APReL: Active preference-based reward learning for human-robot interaction. Utilizing "Mountain Car" environment, learn from human preferences to reach the goal state. Applications in robotics and adaptability to other learning methods.
Size: 2.93 KB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

aaronpmishkin/gaussian_processes 📦
Preference Learning with Gaussian Processes and Bayesian Optimization
Language: Python - Size: 272 KB - Last synced at: over 2 years ago - Pushed at: about 8 years ago - Stars: 7 - Forks: 0

makgyver/PRL
[P]reference and [R]ule [L]earning algorithm implementation for Python 3 (https://arxiv.org/abs/1812.07895)
Language: Python - Size: 117 KB - Last synced at: 6 days ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 1

TristanFauvel/Bayesian_test_for_preference
An analysis of preference comparisons based on the Bayes factor
Language: Jupyter Notebook - Size: 85 KB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

rowlandseymour/BSBT
Bayesian Spatial Bradley--Terry
Language: R - Size: 44.3 MB - Last synced at: about 18 hours ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 1

Bekyilma/Master_thesis
Constructive Preference Elicitation for Social Choice With Setwise max-margin Learning.
Language: Python - Size: 9.24 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

jimparr19/pypbl
Python library for preference based learning
Language: Python - Size: 1.34 MB - Last synced at: 2 months ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 3

afiliot/Preference-Learning-And-Movie-Reviews
Project on preference learning - ENSAE ParisTech
Language: Python - Size: 5.58 MB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

youjin-c/Swipe
Preferences Learning JS app for visual images
Language: Python - Size: 28.5 MB - Last synced at: 2 months ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0
