GitHub topics: direct-preference-optimization

Repositories

liushunyu/awesome-direct-preference-optimization

A Survey of Direct Preference Optimization (DPO)

Size: 3.11 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 42 - Forks: 0

codelion/pts

Pivotal Token Search

Language: Python - Size: 80.1 KB - Last synced at: 11 days ago - Pushed at: about 1 month ago - Stars: 101 - Forks: 6

cluebbers/dpo-rlhf-paraphrase-types

Enhancing paraphrase-type generation using Direct Preference Optimization (DPO) and Reinforcement Learning from Human Feedback (RLHF), with large-scale HPC support. This project aligns model outputs to human-ranked data for robust, safety-focused NLP.

Language: Jupyter Notebook - Size: 32.8 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

cluebbers/adverserial-paraphrasing

Evaluate how LLaMA 3.1 8B handles paraphrased adversarial prompts targeting refusal behavior.

Language: Jupyter Notebook - Size: 430 KB - Last synced at: 22 days ago - Pushed at: 27 days ago - Stars: 1 - Forks: 0

artaasd95/rap-music-generator

The Rap Music Generator project is an innovative LLM-based tool designed to create rap lyrics. It offers multiple fine-tuning approaches to accommodate diverse rap generation techniques, providing users with a versatile platform for generating unique and stylistically varied content.

Language: Jupyter Notebook - Size: 94.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

AI-14/r2gpoallm

[Cog. Comp. 2025] [Official code] - Engaging preference optimization alignment in large language model for continual radiology report generation: A hybrid approach

Language: Python - Size: 5.77 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

akhilpandey95/LLMSciSci

Experiments, and how-to guide for the lecture "Large language models for Scientometrics"

Language: Jupyter Notebook - Size: 1.64 MB - Last synced at: 15 days ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

mlvlab/VidChain

Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning", AAAI 2025

Language: Python - Size: 9.02 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 13 - Forks: 0

rasyosef/phi-2-sft-and-dpo

Notebooks to create an instruction following version of Microsoft's Phi 2 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)

Language: Jupyter Notebook - Size: 19.5 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 2 - Forks: 0

eliashornberg/EPFLLaMA

EPFLLaMA: A lightweight language model fine-tuned on EPFL curriculum content. Specialized for STEM education and multiple-choice question answering. Implements advanced techniques like SFT, DPO, and quantization.

Language: Jupyter Notebook - Size: 28.1 MB - Last synced at: 23 days ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

rasyosef/phi-1_5-instruct

Notebooks to create an instruction following version of Microsoft's Phi 1.5 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)

Size: 3.91 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

AliBakly/EPFLLaMA

Language: Jupyter Notebook - Size: 28.1 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Related Keywords

direct-preference-optimization 12 large-language-models 5 supervised-finetuning 5 llm 4 pytorch 3 natural-language-processing 3 transformers 3 deep-learning 3 lora 2 artificial-intelligence 2 trl 2 huggingface 2 reinforcement-learning 2 alignment 2 dpo 2 llms 2 llm-training 1 machine-learning 1 python 1 reinforcement-learning-from-human-feedback 1 alignment-strategies 1 bioinformatics 1 chest-xrays 1 medical-image-analysis 1 preference-learning 1 radiology-report-generation 1 transformer 1 finetuning-llms 1 in-context-learning 1 reproducibility 1 scientometrics 1 aaai2025 1 dense-video-captioning 1 long-video-understanding 1 multimodal-large-language-models 1 large-language-model 1 llm-steering 1 mech-interp 1 phi-4 1 phi-4-mini 1 phi4 1 phi4-mini 1 pivotal-token-search 1 pivotal-tokens 1 reasoning-agent 1 reasoning-language-models 1 reasoning-models 1 sae 1 sparse-autoencoder 1 steering-vector 1 tokens 1 llm-inference 1 human-feedback 1 paraphrase-generation 1 paraphrase-type-generation 1 dataset-generation 1 survey 1 redteam 1 review 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos