GitHub topics: direct-preference-optimization
liushunyu/awesome-direct-preference-optimization
A Survey of Direct Preference Optimization (DPO)
Size: 3.11 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 42 - Forks: 0

codelion/pts
Pivotal Token Search
Language: Python - Size: 80.1 KB - Last synced at: 11 days ago - Pushed at: about 1 month ago - Stars: 101 - Forks: 6

cluebbers/dpo-rlhf-paraphrase-types
Enhancing paraphrase-type generation using Direct Preference Optimization (DPO) and Reinforcement Learning from Human Feedback (RLHF), with large-scale HPC support. This project aligns model outputs to human-ranked data for robust, safety-focused NLP.
Language: Jupyter Notebook - Size: 32.8 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

cluebbers/adverserial-paraphrasing
Evaluate how LLaMA 3.1 8B handles paraphrased adversarial prompts targeting refusal behavior.
Language: Jupyter Notebook - Size: 430 KB - Last synced at: 22 days ago - Pushed at: 27 days ago - Stars: 1 - Forks: 0

artaasd95/rap-music-generator
The Rap Music Generator project is an innovative LLM-based tool designed to create rap lyrics. It offers multiple fine-tuning approaches to accommodate diverse rap generation techniques, providing users with a versatile platform for generating unique and stylistically varied content.
Language: Jupyter Notebook - Size: 94.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

AI-14/r2gpoallm
[Cog. Comp. 2025] [Official code] - Engaging preference optimization alignment in large language model for continual radiology report generation: A hybrid approach
Language: Python - Size: 5.77 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

akhilpandey95/LLMSciSci
Experiments, and how-to guide for the lecture "Large language models for Scientometrics"
Language: Jupyter Notebook - Size: 1.64 MB - Last synced at: 15 days ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

mlvlab/VidChain
Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning", AAAI 2025
Language: Python - Size: 9.02 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 13 - Forks: 0

rasyosef/phi-2-sft-and-dpo
Notebooks to create an instruction following version of Microsoft's Phi 2 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)
Language: Jupyter Notebook - Size: 19.5 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 2 - Forks: 0

eliashornberg/EPFLLaMA
EPFLLaMA: A lightweight language model fine-tuned on EPFL curriculum content. Specialized for STEM education and multiple-choice question answering. Implements advanced techniques like SFT, DPO, and quantization.
Language: Jupyter Notebook - Size: 28.1 MB - Last synced at: 23 days ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

rasyosef/phi-1_5-instruct
Notebooks to create an instruction following version of Microsoft's Phi 1.5 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)
Size: 3.91 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

AliBakly/EPFLLaMA
EPFLLaMA: A lightweight language model fine-tuned on EPFL curriculum content. Specialized for STEM education and multiple-choice question answering. Implements advanced techniques like SFT, DPO, and quantization.
Language: Jupyter Notebook - Size: 28.1 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0
