An open API service providing repository metadata for many open source software ecosystems.

Topic: "dpo"

oumi-ai/oumi

Easily fine-tune, evaluate and deploy Qwen3, DeepSeek-R1, Llama 4 or any open source LLM / VLM!

Language: Python - Size: 10 MB - Last synced at: about 20 hours ago - Pushed at: about 20 hours ago - Stars: 8,331 - Forks: 626

shibing624/MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO、GRPO。

Language: Python - Size: 12.8 MB - Last synced at: 1 day ago - Pushed at: 15 days ago - Stars: 4,004 - Forks: 585

PKU-Alignment/align-anything

Align Anything: Training All-modality Model with Feedback

Language: Jupyter Notebook - Size: 108 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 3,808 - Forks: 472

ContextualAI/HALOs

A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

Language: Python - Size: 5.5 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 868 - Forks: 49

jianzhnie/LLamaTuner

Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

Language: Python - Size: 1.02 MB - Last synced at: 6 days ago - Pushed at: 6 months ago - Stars: 608 - Forks: 64

zhaorw02/DeepMesh

Official code of DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

Language: Python - Size: 19.5 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 587 - Forks: 25

ukairia777/tensorflow-nlp-tutorial

tensorflow를 사용하여 텍스트 전처리부터, Topic Models, BERT, GPT, LLM과 같은 최신 모델의 다운스트림 태스크들을 정리한 Deep Learning NLP 저장소입니다.

Language: Jupyter Notebook - Size: 126 MB - Last synced at: 5 days ago - Pushed at: 27 days ago - Stars: 548 - Forks: 286

sail-sg/oat

🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

Language: Python - Size: 15.7 MB - Last synced at: about 5 hours ago - Pushed at: about 6 hours ago - Stars: 414 - Forks: 31

dvlab-research/Step-DPO

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"

Language: Python - Size: 6.47 MB - Last synced at: 22 days ago - Pushed at: 6 months ago - Stars: 372 - Forks: 15

TUDB-Labs/mLoRA

An Efficient "Factory" to Build Multiple LoRA Adapters

Language: Python - Size: 11 MB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 327 - Forks: 61

armbues/SiLLM

SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.

Language: Python - Size: 618 KB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 275 - Forks: 27

RockeyCoss/SPO

[CVPR 2025] Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization

Language: Python - Size: 30.3 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 197 - Forks: 6

TideDra/VL-RLHF

A RLHF Infrastructure for Vision-Language Models

Language: Python - Size: 3.8 MB - Last synced at: 5 days ago - Pushed at: 8 months ago - Stars: 179 - Forks: 7

argilla-io/notus

Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach

Language: Python - Size: 4.43 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 168 - Forks: 14

YangLing0818/IterComp

[ICLR 2025] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

Language: Python - Size: 32.8 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 161 - Forks: 10

anilca/NetTrader.Indicator

Technical anaysis library for .NET

Language: C# - Size: 562 KB - Last synced at: 3 days ago - Pushed at: 11 months ago - Stars: 142 - Forks: 53

AIDC-AI/CHATS

CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation (ICML2025)

Language: Python - Size: 862 KB - Last synced at: about 5 hours ago - Pushed at: about 9 hours ago - Stars: 137 - Forks: 2

Goekdeniz-Guelmez/mlx-lm-lora

Train Large Language Models on MLX.

Language: Python - Size: 1.42 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 133 - Forks: 17

codelion/pts

Pivotal Token Search

Language: Python - Size: 692 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 111 - Forks: 7

NiuTrans/Vision-LLM-Alignment

This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.

Language: Python - Size: 153 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 109 - Forks: 8

wendell0218/Awesome-RL-for-Video-Generation

A curated list of papers on reinforcement learning for video generation

Size: 17.6 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 70 - Forks: 0

YangLing0818/SuperCorrect-llm

[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction

Language: Python - Size: 3.69 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 62 - Forks: 4

martin-wey/CodeUltraFeedback

CodeUltraFeedback: aligning large language models to coding preferences

Language: Python - Size: 12.4 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 54 - Forks: 2

liushunyu/awesome-direct-preference-optimization

A Survey of Direct Preference Optimization (DPO)

Size: 3.12 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 43 - Forks: 0

junkangwu/beta-DPO

[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$

Language: Python - Size: 43 KB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 43 - Forks: 2

li-plus/flash-preference

Accelerate LLM preference tuning via prefix sharing with a single line of code

Language: Python - Size: 269 KB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 41 - Forks: 0

TianduoWang/DPO-ST

[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

Language: Python - Size: 1.65 MB - Last synced at: 3 months ago - Pushed at: 12 months ago - Stars: 41 - Forks: 5

taco-group/Re-Align

A novel alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models.

Language: Python - Size: 18.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 40 - Forks: 1

dannylee1020/openpo

Building synthetic data for preference tuning

Language: Python - Size: 10.7 MB - Last synced at: about 10 hours ago - Pushed at: 7 months ago - Stars: 27 - Forks: 0

allenai/hybrid-preferences

Learning to route instances for Human vs AI Feedback (ACL 2025 Main)

Language: Python - Size: 259 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 23 - Forks: 3

kidist-amde/ddro

We introduce the direct document relevance optimization (DDRO) for training a pairwise ranker model. DDRO encourages the model to focus on document-level relevance during generation

Language: Python - Size: 2.09 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 21 - Forks: 2

RobinSmits/Dutch-LLMs

Various training, inference and validation code and results related to Open LLM's that were pretrained (full or partially) on the Dutch language.

Language: Jupyter Notebook - Size: 8.33 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 18 - Forks: 0

armbues/SiLLM-examples

Examples for using the SiLLM framework for training and running Large Language Models (LLMs) on Apple Silicon

Language: Python - Size: 36.1 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 17 - Forks: 4

NJUxlj/Chinese-MedQA-Qwen2

基于Qwen2+SFT+DPO的医疗问答系统,项目中使用了LLaMA-Factory用于训练,fastllm和vllm用于推理,

Language: Python - Size: 618 KB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 15 - Forks: 2

adithya-s-k/Indic-llm

A open-source framework designed to adapt pre-trained Language Models (LLMs), such as Llama, Mistral, and Mixtral, to a wide array of domains and languages.

Language: Python - Size: 171 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 14 - Forks: 2

sugarandgugu/Simple-Trl-Training

基于DPO算法微调语言大模型,简单好上手。

Language: Python - Size: 43.9 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 0

YJiangcm/BMC

Code for "Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization (ICLR 2025)"

Language: Python - Size: 180 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 11 - Forks: 1

CyberAgentAILab/filtered-dpo

Introducing Filtered Direct Preference Optimization (fDPO) that enhances language model alignment with human preferences by discarding lower-quality samples compared to those generated by the learning model

Language: Jupyter Notebook - Size: 105 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 10 - Forks: 1

NJUxlj/Travel-Agent-based-on-Qwen2-RLHF

A travel agent based on Qwen2.5, fine-tuned by SFT + DPO/PPO/GRPO using traveling question-answer dataset, a mindmap can be output using the response. A RAG system is build upon the tuned qwen2, using Prompt-Template + Tool-Use + Chroma embedding database + LangChain

Language: Python - Size: 155 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 9 - Forks: 1

DaehanKim/EasyRLHF

EasyRLHF aims to provide an easy and minimal interface to train aligned language models, using off-the-shelf solutions and datasets

Language: Python - Size: 73.9 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 0

Zepson-Tech/dpo-laravel

A Laravel package to simplify using DPO Payment API in your application. https://dpogroup.com

Language: PHP - Size: 25.4 KB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 5

karhel/glpi-dporegister

Processings Register for DPO (GDPR) - GLPI Plugin

Language: PHP - Size: 254 KB - Last synced at: 5 months ago - Pushed at: about 2 years ago - Stars: 9 - Forks: 4

vicgalle/configurable-safety-tuning

Data and models for the paper "Configurable Safety Tuning of Language Models with Synthetic Preference Data"

Language: Python - Size: 2.53 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 8 - Forks: 1

DPO-Group/DPO_WooCommerce

This is the DPO Pay plugin for WooCommerce.

Language: PHP - Size: 258 KB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 7 - Forks: 14

pds-dpo/pds-dpo

Official GitHub repository of PDS-DPO: Multimodal Preference Data Synthetic Alignment with Reward Model

Language: Python - Size: 6.71 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 5 - Forks: 0

The-Swarm-Corporation/DPO-MCTS-ToT-Training

This module implements a post-training mechanism that allows a language model to explore various reasoning branches (chain-of-thoughts) using a Monte Carlo Tree Search (MCTS) framework. It selects the branch with the best answer using a cosine similarity evaluator that compares the candidate answer to a known correct answer.

Language: Python - Size: 31.3 KB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 5 - Forks: 0

OctopusMind/DPO

dpo算法实现

Language: Python - Size: 24.4 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

golang-malawi/go-dpo

Unofficial Go library for DPO Group

Language: Go - Size: 44.9 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

ssbuild/llm_dpo

dpo finetuning

Language: Python - Size: 61.5 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 3

DPO-Group/DPO_WHMCS

This is the DPO Pay plugin for WHMCS.

Language: PHP - Size: 56.6 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 4 - Forks: 6

Wang-Xiaodong1899/LeanPO

The official repo for "LeanPO: Lean Preference Optimization for Likelihood Alignment in Video-LLMs"

Size: 0 Bytes - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 0

DPO-Group/DPO_Gravity_Forms

This is the DPO Group plugin for Gravity Forms.

Language: PHP - Size: 415 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 2

dbf/django-dpotools

An open source collection of tools meant to simplify the life of data protection officers (DPOs) of large entities

Language: Python - Size: 870 KB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 2 - Forks: 1

kyryl-opens-ml/rlfh-dagster-modal

Re-usable & scalable RLHF training pipeline with Dagster and Modal.

Language: Python - Size: 1.35 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

razor-informatics/dpo-php

DPO Group Payment gateway PHP SDK

Language: PHP - Size: 14.6 KB - Last synced at: 14 days ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

RATHOD-SHUBHAM/Finetuning-LLMs

This repository contains experiments on fine-tuning LLMs (Llama, Llama3.1, Gemma). It includes notebooks for model tuning, data preprocessing, and hyperparameter optimization to enhance model performance.

Language: Jupyter Notebook - Size: 6.27 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

saiprabhakar/Summarization_DPO_SALT

Improving Summarization with Human Edits

Language: Python - Size: 6.19 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

ducnh279/Align-LLMs-with-DPO

Align a Large Language Model (LLM) with DPO loss

Language: Jupyter Notebook - Size: 28.3 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

M4-ai/M4-ai_01_Yi_9B

We're improving Yi-9B-200K with a ton of new abilities for high performance in generalist and specialist tasks.

Size: 11.7 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

eyess-glitch/phi-2-fine-tuning

This repository contains the source code used for finetuning the LLM phi-2 with several frameworks, such as DPO.

Language: Jupyter Notebook - Size: 12.7 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

NYCyberLawyer/PRIVACYMAP

Privacy Mapping Open Source Software

Language: TypeScript - Size: 1.68 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 2

dipnot/direct-pay-online-php

Unofficial PHP wrapper for Direct Pay Online API

Language: PHP - Size: 48.8 KB - Last synced at: 10 days ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 2

yuanxuns/DPO-in-Pytorch

This repository implements Direct Preference Optimization (DPO) in pytorch.

Language: Python - Size: 22.5 KB - Last synced at: about 23 hours ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

VocabVictor/verl-plus

增加verl ascend适配;做一些小的改进

Language: Python - Size: 7.59 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

longern/ReDuMix

Self-Reflective Dual-Context Mixture Decoding

Size: 10.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

BlankCode0/YLMSR_implementation

Implementation of the the research paper DPO: your language model is secretly a reward model

Language: Python - Size: 45.9 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

Truman-min-show/CVPR2024-DPO-Summary

2024年CVPR中所有文献的标题以及摘要等信息整理,以及一个基于MT5模型的DPO学习任务

Language: Python - Size: 6.28 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

zxuu/RLHF

LLM中相关RLHF算法实现与学习

Language: Python - Size: 1.67 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

tripolskypetr/agent-tune

A React-based tool for constructing fine-tuning datasets with list and grid forms, featuring the ability to download and upload data as JSONL files. This project leverages the react-declarative library to create dynamic, interactive forms for defining user inputs, preferred outputs, and non-preferred outputs, along with associated tools

Language: TypeScript - Size: 5.4 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

DigitalDesignDen/open-scope-vhdl

Official repo of the open scope (a digital oscilloscope) developed by Digital Design Den

Language: VHDL - Size: 623 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

alessiodesogus/t5utor

From Attention to Education: T5utor Is Really All You Need

Language: Jupyter Notebook - Size: 4.8 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

thibaud-perrin/llm-research-toolbox

A curated list of repositories exploring various aspects of Large Language Model (LLM) development, including fine-tuning, dataset generation, multimodal models, and preference alignment.

Size: 0 Bytes - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

thibaud-perrin/preference-alignment

Exploring innovative methods like DPO and ORPO for aligning language models with human preferences efficiently and effectively.

Language: Jupyter Notebook - Size: 113 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

RobinSmits/Schaapje

Schaapje - A Dutch Small Language Model

Language: Jupyter Notebook - Size: 794 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

DPO-Group/DPO_Magento_2

This is the DPO Pay plugin for Magento 2.

Language: PHP - Size: 191 KB - Last synced at: 2 days ago - Pushed at: 9 months ago - Stars: 0 - Forks: 2

omarmnfy/Finetune-Llama3-using-Direct-Preference-Optimization

This repository contains Jupyter Notebooks, scripts, and datasets used in our finetuning experiments. The project focuses on Direct Preference Optimization (DPO), a method that simplifies the traditional finetuning process by using the model itself as a feedback mechanism.

Language: Jupyter Notebook - Size: 889 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

levje/resnet-dpo

Proof-of-concept leveraging DPO loss to fine-tune a ResNet to classify images from CIFAR10 dataset.

Language: Python - Size: 71.3 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

junkangwu/Dr_DPO

Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

Language: Python - Size: 24.4 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

SharathHebbar/Coding-Templates

Coding Templates

Language: Jupyter Notebook - Size: 7.81 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

pforret/awesome_dpo

Awesome tools and information for Data Protection Officers - GDPR professionals

Language: Shell - Size: 3.15 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 2

somvy/slic-hf

Experiments of divergence functions for DPO, RLHF

Language: Jupyter Notebook - Size: 3.36 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

zakcali/pandas-ta2numba

replaced pandas-ta calls with numpy/numba functions to speed up calculating ema, tema, rsi, mfi, adx, dpo

Language: Python - Size: 313 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Wahido589/WaRa

This is my Use Case for the Udacity Nanodegree "Data Product Management"

Language: Jupyter Notebook - Size: 3.45 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

DirectPay-Online/DPO_WooCommerce Fork of DPO-Group/DPO_WooCommerce 📦

This is the DPO Group plugin for WooCommerce.

Language: PHP - Size: 103 KB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 2

developermiranda/dpoquiz

Projeto criado durante a imersão React v2

Language: JavaScript - Size: 121 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0