Topic: "inference-time-compute"
haizelabs/verdict
Scale your LLM-as-a-judge.
Language: Jupyter Notebook - Size: 10 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 236 - Forks: 16

SalesforceAIResearch/MAS-Zero
Designing Multi-Agent Systems with Zero Supervision
Language: Python - Size: 6.04 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 49 - Forks: 4

divelab/Sys2Bench
Sys2Bench is a benchmarking suite designed to evaluate reasoning and planning capabilities of large language models across algorithmic, logical, arithmetic, and common-sense reasoning tasks.
Language: Python - Size: 57.7 MB - Last synced at: 11 days ago - Pushed at: 3 months ago - Stars: 22 - Forks: 3

AI4Science-WestlakeU/t_scend
This repo is the code for T-SCEND, a novel framework that significantly improves diffusion model’s reasoning capabilities with better energy-based training and scaling up test-time computation.
Language: Python - Size: 1.47 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 10 - Forks: 0

Amirhosein-gh98/Guided-by-Gut
The official PyTorch implementation for the Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence
Language: Python - Size: 2.55 MB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 5 - Forks: 0
