GitHub topics: reasoning-language-models

Repositories

Sandia7171717171/CharmBench

CharmBench offers a challenging benchmark for large vision-language models, providing datasets and evaluation tools to enhance multimodal reasoning. Check out our latest updates and contribute to the project by starring the repo! 🌟👩💻

Language: Jupyter Notebook - Size: 7.23 MB - Last synced at: about 20 hours ago - Pushed at: about 21 hours ago - Stars: 0 - Forks: 0

dvlab-research/Seg-Zero

Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"

Language: Python - Size: 5.01 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 404 - Forks: 17

mims-harvard/ToolUniverse

ToolUniverse is a collection of biomedical tools designed for AI agents

Language: Python - Size: 2.96 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 127 - Forks: 19

SalesforceAIResearch/MAS-Zero

Designing Multi-Agent Systems with Zero Supervision

Language: Python - Size: 6.04 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 49 - Forks: 4

tubiccelavi/Poker-COACH

Ai Vr Machine Learning Natural language Poker Coach

Language: JavaScript - Size: 42 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

mims-harvard/TxAgent

TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools

Language: Python - Size: 55.9 MB - Last synced at: 2 days ago - Pushed at: about 2 months ago - Stars: 459 - Forks: 68

dialexity/dialectical-framework

Turn stories, strategies, or systems into insight. Auto-generate Dialectical Wheels (DWs) from any text to reveal blind spots, surface polarities, and trace dynamic paths toward synthesis. DWs are semantic maps that expose tension, transformation, and coherence within a system—whether narrative, ethical, organizational, or technological.

Language: Python - Size: 5.98 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 5 - Forks: 2

WisdomShell/RewardAnything

RewardAnything: Generalizable Principle-Following Reward Models

Language: Python - Size: 3.66 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

zhuohaoyu/RewardAnything

RewardAnything: Generalizable Principle-Following Reward Models

Language: HTML - Size: 3.81 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

linhaowei1/kumo

☁️ KUMO: Generative Evaluation of Complex Reasoning in Large Language Models

Language: Jupyter Notebook - Size: 630 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 18 - Forks: 0

tomascupr/thinkthread

thinkthread SDK - Supercharge Your AI Applications with Human-Like Reasoning

Language: Python - Size: 510 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 11 - Forks: 0

thinkwee/NOVER

R1-Zero on any Data

Language: Python - Size: 1020 KB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 4 - Forks: 0

tum-ai/number-token-loss

A regression-alike loss for numerical reasoning in language models

Language: Jupyter Notebook - Size: 129 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 9 - Forks: 3

a-m-team/a-m-models

a-m-team's exploration in large language modeling

Size: 13.5 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 128 - Forks: 3

LightChen233/Awesome-Long-Chain-of-Thought-Reasoning

Latest Advances on Long Chain-of-Thought Reasoning

Size: 17.2 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 339 - Forks: 20

dvlab-research/VisionReasoner

The official implement of "VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning"

Language: Python - Size: 12.1 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 130 - Forks: 8

andrewliao11/LongPerceptualThoughts

The official implementation of "LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception"

Language: Python - Size: 3.27 MB - Last synced at: 6 days ago - Pushed at: 14 days ago - Stars: 3 - Forks: 1

NLPForUA/ZNO

Structured test tasks and model tuning scripts for multiple subjects from ZNO - the Ukrainian External Independent Evaluation (ЗНО)

Language: Python - Size: 2.27 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 5 - Forks: 0

zihao-ai/unthinking_vulnerability

To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Models

Language: Python - Size: 14.2 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 29 - Forks: 0

MozerWang/AMPO

[arxiv: 2505.02156] Adaptive Thinking via Mode Policy Optimization for Social Language Agents

Language: Python - Size: 9.54 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 17 - Forks: 2

The-FinAI/Fino1

This is the repo of developing reasoning models in the specific domain of financial, aim to enhance models capabilities in handling financial reasoning tasks.

Language: Jupyter Notebook - Size: 150 KB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 51 - Forks: 9

codelion/pts

Pivotal Token Search

Language: Python - Size: 200 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 6 - Forks: 1

Hyun-Ryu/clover

Official code for "Divide and Translate: Compositional First-Order Logic Translation and Verification for Complex Logical Reasoning", ICLR 2025.

Language: Python - Size: 404 KB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 12 - Forks: 2

yihedeng9/OpenVLThinker

OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement

Language: Python - Size: 3.41 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 83 - Forks: 5

XIXUM/XIXUM-modeler

AI Model Generator

Language: Java - Size: 24 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 2 - Forks: 0

Trustworthy-ML-Lab/ThinkEdit

An effective weight-editing method for mitigating overly short reasoning in LLMs, and a mechanistic study uncovering how reasoning length is encoded in the model’s representation space.

Language: Python - Size: 545 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 10 - Forks: 1

reasoning-survey/Awesome-Reasoning-Foundation-Models

✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models

Size: 7.37 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 571 - Forks: 56

Ruiyang-061X/Awesome-MLLM-Reasoning

📖Curated list about reasoning abilitiy of MLLM, including OpenAI o1, OpenAI o3-mini, and Slow-Thinking.

Size: 7.81 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 7 - Forks: 0

krystalan/DRT

Deep Reasoning Translation via Reinforcement Learning (arXiv preprint 2025); DRT: Deep Reasoning Translation via Long Chain-of-Thought (arXiv preprint 2024)

Size: 2.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 218 - Forks: 9

DolbyUUU/Sudoku4LLM

Sudoku4LLM is a Sudoku dataset generator for training and evaluating reasoning in Large Language Models (LLMs). It offers customizable puzzles, difficulty levels, and 11 serialization formats to support structured data reasoning and Chain of Thought (CoT) experiments.

Language: Python - Size: 29.3 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 4 - Forks: 0

xhinini/LLM-Reasoning-Review

A curated collection of research papers on reasoning capabilities of Large Language Models (LLMs). This repository organizes and categorizes works that evaluate, benchmark, and analyze reasoning in LLMs, including methods, techniques, datasets, and survey papers.

Size: 26.4 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

mdda/getting-to-aha-with-tpus

Reasoning-from-Zero using gemma.JAX.nnx on TPUs

Language: Python - Size: 292 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 9 - Forks: 0

DolbyUUU/DeepEnlighten

Pure RL without SFT to post-train base models for social reasoning capabilities. Lightweight replication of DeepSeek-R1-Zero with Social IQa dataset.

Language: Python - Size: 21.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

DolbyUUU/Logic-RL-Lite

Lightweight replication study of DeepSeek-R1-Zero. Interesting findings include "No Aha Moment", "Longer CoT ≠ Accuracy", and "Language Mixing in Instruct Models".

Language: Python - Size: 14.8 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

Wild-Cooperation-Hub/Awesome-MLLM-Reasoning-Benchmarks

A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.

Size: 89.8 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 26 - Forks: 2

aryan-jadon/Synthetic-Data-Generation-and-Evaluation-using-Reasoning-Models

This repository contains the implementation of our research on optimizing Retrieval-Augmented Generation (RAG) systems for technical domains. Our work addresses the unique challenges of precise information extraction from complex, domain-specific documents by introducing token-aware evaluation metrics and synthetic data generation pipeline.

Language: Jupyter Notebook - Size: 13.9 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

spcl/x1

Official Implementation of "Reasoning Language Models: A Blueprint"

Language: Python - Size: 563 KB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 37 - Forks: 6

Related Keywords

reasoning-language-models 37 large-language-models 12 llm 12 reasoning 10 reinforcement-learning 8 deepseek-r1 8 reasoning-agent 8 multimodal 5 reasoning-models 5 deepseek 4 llms 4 benchmark 3 post-training 3 fine-tuning 3 ai 3 chain-of-thought 3 evaluation 3 agents 3 multimodal-large-language-models 3 rlhf 2 reward-models 2 grpo 2 alignment 2 natural-language-processing 2 agent 2 o1 2 openai 2 rl 2 vision-language-model 2 generative-ai 2 gemma 2 llm-reasoning 2 llama 2 gpt-o1 2 segmentation 2 deep-learning 2 precision-medicine 2 tool-use 2 therapeutics 2 llm-evaluation 2 mechanistic-interpretability 1 interpretable-machine-learning 1 foundation-models 1 awesome 1 chain-of-thought-reasoning 1 models 1 modeling 1 knowledge-graph 1 generator 1 cognitive-neuroscience 1 ai-code-generator 1 logical-reasoning 1 tokens 1 steering-vector 1 sparse-autoencoder 1 rlm 1 sae 1 pivotal-tokens 1 pivotal-token-search 1 phi4-mini 1 phi4 1 phi-4-mini 1 phi-4 1 mech-interp 1 llm-steering 1 llm-agents 1 gemini 1 llms-reasoning 1 code 1 rl-for-llm 1 papers 1 research-paper 1 jax 1 nnx 1 tpu 1 rl-for-finance 1 mllm-reasoning 1 multimodal-reasoning 1 multimodal-reasoning-benchmarks 1 llm-framework 1 synthetic-dataset-generation 1 dataset-generator 1 large-reasoning-models 1 lrm 1 machine-translation 1 literature-translation 1 mcts-for-llms 1 slow-thinking 1 reasoning-llms 1 o3-mini 1 multi-modal-large-language-model 1 multi-modal 1 mllm 1 lvlm 1 openai-o1 1 o3 1 long-chain-of-thought 1 long 1 number-token-loss 1 llm-training 1