An open API service providing repository metadata for many open source software ecosystems.

Topic: "trl"

jasonvanf/llama-trl

LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA

Language: Python - Size: 37.1 MB - Last synced at: about 3 hours ago - Pushed at: about 2 years ago - Stars: 219 - Forks: 23

argilla-io/notus

Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach

Language: Python - Size: 4.43 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 168 - Forks: 14

GAD-cell/vlm-grpo

An implementation of GRPO for Unsloth's VLMs training

Language: Python - Size: 829 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 58 - Forks: 6

ssbuild/llm_rlhf

realize the reinforcement learning training for gpt2 llama bloom and so on llm model

Language: Python - Size: 388 KB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 26 - Forks: 2

yflyzhang/simpleR1

simpleR1: A Simple Framework for Training R1-like Models

Language: Python - Size: 1020 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 23 - Forks: 2

RobinSmits/Dutch-LLMs

Various training, inference and validation code and results related to Open LLM's that were pretrained (full or partially) on the Dutch language.

Language: Jupyter Notebook - Size: 8.33 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 18 - Forks: 0

sugarandgugu/Simple-Trl-Training

基于DPO算法微调语言大模型,简单好上手。

Language: Python - Size: 43.9 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 0

Akshint0407/Nano-R1

This project demonstrates the process of fine-tuning the Qwen2.5-3B-Instruct model using GRPO (Generalized Reward Policy Optimization) on the GSM8K dataset.

Language: Jupyter Notebook - Size: 769 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 3 - Forks: 0

Mikesterner87/Nano-R1

This project demonstrates the process of fine-tuning the Qwen2.5-3B-Instruct model using GRPO (Generalized Reward Policy Optimization) on the GSM8K dataset.

Language: Jupyter Notebook - Size: 109 KB - Last synced at: 1 day ago - Pushed at: 8 days ago - Stars: 2 - Forks: 0

rasyosef/phi-2-sft-and-dpo

Notebooks to create an instruction following version of Microsoft's Phi 2 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)

Language: Jupyter Notebook - Size: 19.5 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

hongbo-wei/Alibaba-Cross-Document-Deep-Research-Agent-Prototype

An AI agent system for e-commerce search using Chain-of-Thought reasoning and RAG technology. Implemented BGE embeddings with FAISS for semantic retrieval, achieving 85%+ accuracy and sub-2-second response times. Integrated PPO reinforcement learning for agent optimization and multi-step tool calling.

Language: Python - Size: 13.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

LegendLeoChen/llm-finetune

使用trl、peft、transformers等库,实现对huggingface上模型的微调。

Language: Python - Size: 6.84 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

rasyosef/phi-1_5-instruct

Notebooks to create an instruction following version of Microsoft's Phi 1.5 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)

Size: 3.91 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

WCoetser/Trl.TermDataRepresentation

The overall aim of this project is to create a term rewriting system that could be useful in everyday programming, and to represent data in a way that roughly correspond to the definition of a term in formal logic. Terms should be familiar to any programmer because they are basically constants, variables, and function symbols.

Language: C# - Size: 371 KB - Last synced at: 13 days ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

pberlandier/irl-to-bal

ODM: TRL to BAL rules automated translation

Language: Java - Size: 1.26 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

Daddy-Myth/Flan-T5-rlhf-align

Aligning FLAN-T5 with Reinforcement Learning from Human Feedback (RLHF) for Neutral, Grammatically Correct News Summaries

Language: Jupyter Notebook - Size: 161 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

SofiaKhutsieva/LLM_experiments

Эксперименты с LLM (инференс, rag, дообучение)

Language: Jupyter Notebook - Size: 307 KB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0