Nano-R1

This project demonstrates the process of fine-tuning the Qwen2.5-3B-Instruct model using GRPO (Generalized Reward Policy Optimization) on the GSM8K dataset.

JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Akshint0407%2FNano-R1
PURL: pkg:github/Akshint0407/Nano-R1

Stars: 3
Forks: 0
Open issues: 0

License: apache-2.0
Language: Jupyter Notebook
Size: 769 KB
Dependencies parsed at: Pending

Created at: 4 months ago
Updated at: 3 months ago
Pushed at: 4 months ago
Last synced at: 2 months ago

Topics: adapters, grpo, huggingface, python, qwen2-5, safetensors, text-generation-inference, transformer, trl, unsloth

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos

GitHub / Akshint0407 / Nano-R1