Topic: "trl"
jasonvanf/llama-trl
LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA
Language: Python - Size: 37.1 MB - Last synced at: about 3 hours ago - Pushed at: about 2 years ago - Stars: 219 - Forks: 23

argilla-io/notus
Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach
Language: Python - Size: 4.43 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 168 - Forks: 14

GAD-cell/vlm-grpo
An implementation of GRPO for Unsloth's VLMs training
Language: Python - Size: 829 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 58 - Forks: 6

ssbuild/llm_rlhf
realize the reinforcement learning training for gpt2 llama bloom and so on llm model
Language: Python - Size: 388 KB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 26 - Forks: 2

yflyzhang/simpleR1
simpleR1: A Simple Framework for Training R1-like Models
Language: Python - Size: 1020 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 23 - Forks: 2

RobinSmits/Dutch-LLMs
Various training, inference and validation code and results related to Open LLM's that were pretrained (full or partially) on the Dutch language.
Language: Jupyter Notebook - Size: 8.33 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 18 - Forks: 0

sugarandgugu/Simple-Trl-Training
基于DPO算法微调语言大模型,简单好上手。
Language: Python - Size: 43.9 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 0

Akshint0407/Nano-R1
This project demonstrates the process of fine-tuning the Qwen2.5-3B-Instruct model using GRPO (Generalized Reward Policy Optimization) on the GSM8K dataset.
Language: Jupyter Notebook - Size: 769 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 3 - Forks: 0

Mikesterner87/Nano-R1
This project demonstrates the process of fine-tuning the Qwen2.5-3B-Instruct model using GRPO (Generalized Reward Policy Optimization) on the GSM8K dataset.
Language: Jupyter Notebook - Size: 109 KB - Last synced at: 1 day ago - Pushed at: 8 days ago - Stars: 2 - Forks: 0

rasyosef/phi-2-sft-and-dpo
Notebooks to create an instruction following version of Microsoft's Phi 2 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)
Language: Jupyter Notebook - Size: 19.5 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

hongbo-wei/Alibaba-Cross-Document-Deep-Research-Agent-Prototype
An AI agent system for e-commerce search using Chain-of-Thought reasoning and RAG technology. Implemented BGE embeddings with FAISS for semantic retrieval, achieving 85%+ accuracy and sub-2-second response times. Integrated PPO reinforcement learning for agent optimization and multi-step tool calling.
Language: Python - Size: 13.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

LegendLeoChen/llm-finetune
使用trl、peft、transformers等库,实现对huggingface上模型的微调。
Language: Python - Size: 6.84 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

rasyosef/phi-1_5-instruct
Notebooks to create an instruction following version of Microsoft's Phi 1.5 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)
Size: 3.91 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

WCoetser/Trl.TermDataRepresentation
The overall aim of this project is to create a term rewriting system that could be useful in everyday programming, and to represent data in a way that roughly correspond to the definition of a term in formal logic. Terms should be familiar to any programmer because they are basically constants, variables, and function symbols.
Language: C# - Size: 371 KB - Last synced at: 13 days ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

pberlandier/irl-to-bal
ODM: TRL to BAL rules automated translation
Language: Java - Size: 1.26 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

Daddy-Myth/Flan-T5-rlhf-align
Aligning FLAN-T5 with Reinforcement Learning from Human Feedback (RLHF) for Neutral, Grammatically Correct News Summaries
Language: Jupyter Notebook - Size: 161 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

SofiaKhutsieva/LLM_experiments
Эксперименты с LLM (инференс, rag, дообучение)
Language: Jupyter Notebook - Size: 307 KB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0
