An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: llm-reasoning

inclusionAI/AReaL

Distributed RL System for LLM Reasoning

Language: Python - Size: 5.27 MB - Last synced at: about 19 hours ago - Pushed at: about 20 hours ago - Stars: 1,145 - Forks: 52

yinizhilian/ICLR2025-Papers-with-Code

历年ICLR论文和开源项目合集,包含ICLR2021、ICLR2022、ICLR2023、ICLR2024、ICLR2025.

Size: 1.47 MB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 240 - Forks: 11

AmanPriyanshu/My-Personal-Model-Picks

A curated showcase of small, on-premise models with great utility, enabling privacy-centered deployment and easy adaptation to bio-inspired use cases.

Language: HTML - Size: 65.4 KB - Last synced at: 3 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

IAAR-Shanghai/Awesome-Attention-Heads

An awesome repository & A comprehensive survey on interpretability of LLM attention heads.

Language: TeX - Size: 6.07 MB - Last synced at: 8 days ago - Pushed at: about 2 months ago - Stars: 338 - Forks: 10

inclusionAI/Ling

Ling is a MoE LLM provided and open-sourced by InclusionAI.

Language: Python - Size: 3.26 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 134 - Forks: 13

bruno686/Awesome-RL-based-LLM-Reasoning

Awesome RL-based LLM Reasoning

Size: 48.8 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 415 - Forks: 21

CodeEval-Pro/CodeEval-Pro

Official repo for "HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation Task"

Language: Python - Size: 4.1 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 26 - Forks: 2

AStroCvijo/react_reproduction

Reproduction of ICLR 2023 paper "ReAct: Synergizing Reasoning and Acting in Language Models".

Language: Python - Size: 5.83 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 1 - Forks: 0

ethicalabs-ai/ouroboros

Self-Improving LLMs Through Iterative Refinement

Language: Python - Size: 429 KB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 3 - Forks: 0

Cristian-Curaba/CryptoFormalEval

We introduce a benchmark for testing how well LLMs can find vulnerabilities in cryptographic protocols. By combining LLMs with symbolic reasoning tools like Tamarin, we aim to improve the efficiency and thoroughness of protocol analysis, paving the way for future AI-powered cybersecurity defenses.

Language: Haskell - Size: 7.43 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 5 - Forks: 1

YangLing0818/buffer-of-thought-llm

[NeurIPS 2024 Spotlight] Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

Language: Python - Size: 1.07 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 612 - Forks: 57

YangLing0818/SuperCorrect-llm

[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction

Language: Python - Size: 3.69 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 62 - Forks: 4

tsinghua-fib-lab/SmartAgent

The official repository of "SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World".

Size: 4.68 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 24 - Forks: 1

nl4opt/ORQA

[AAAI 2025] ORQA is a new QA benchmark designed to assess the reasoning capabilities of LLMs in a specialized technical domain of Operations Research. The benchmark evaluates whether LLMs can emulate the knowledge and reasoning skills of OR experts when presented with complex optimization modeling tasks.

Size: 2.48 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 37 - Forks: 0

pittisl/PhyT2V

official code repo of CVPR 2025 paper PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation

Language: Python - Size: 116 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 11 - Forks: 1

tegridydev/mixture-of-persona-research

A “Mixture of Perspectives” Framework for Ethical AI

Size: 12.7 KB - Last synced at: 19 days ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

hriaz17/SayLessRAG

Code for the paper: "Say Less, Mean More: Leveraging Pragmatics in Retrieval-Augmented Generation"

Size: 4.88 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

trangdf/Creating-sample-means-for-measurement-standards-of-intelligence

Creating sample means for measurement standards of intelligence

Size: 1000 Bytes - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

ogrnv/Creating-sample-means-for-measurement-standards-of-intelligence

Creating sample means for measurement standards of intelligence

Language: C - Size: 153 KB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

bowen-upenn/llm_token_bias

[EMNLP 2024] This is the official implementation of the paper "A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners" in PyTorch.

Language: Python - Size: 57.4 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 9 - Forks: 1

UKPLab/emnlp2024-code-prompting

Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs. EMNLP 2024

Language: Python - Size: 46.6 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 15 - Forks: 5

Related Keywords
llm-reasoning 21 llm 14 large-language-models 4 llms 4 gpt 3 rl 2 psychometics 2 measurement-standard 2 transformer 2 iq 2 intelligence 2 llm-inference 2 confidence-intervals 2 artificial-general-intelligence 2 ai 2 agi 2 retrieval-augmented-generation 2 chain-of-thought 2 llm-benchmarking 2 reasoning 2 llm-evaluation 2 tests 2 testing-tools 2 testing 2 statistics 2 llm-agent 2 statistical-analysis 2 sample-mean 2 machine-learning 2 randomly-generated 2 r 2 program-aided-language-model 1 cvpr2025 1 optimization 1 machine-psychology 1 operations-research 1 multi-choice 1 large-language-model 1 mixed-integer-programming 1 mathematical-modelling 1 llm4or 1 llm4opt 1 llm4math 1 lvlm 1 linear-programming 1 ai4or 1 multi-modal 1 aaai2025 1 openai-o1 1 personalization 1 openai 1 conditional-reasoning 1 code-prompting 1 token-bias 1 logical-reasoning 1 rag 1 information-retrieval 1 information-extraction 1 gricean-pragmatics 1 dense-retrieval 1 open-source 1 open-research 1 moral-machines 1 mop-ai 1 mechanistic-interpretability 1 alignment 1 ai-research 1 ai-framework 1 video-generation 1 prompt-tuning 1 diffusion-models 1 interpretability 1 cognitive-neuroscience 1 circuit-analysis 1 awesome 1 attention-mechanism 1 attention-head-mining 1 small-llm 1 models 1 llmops 1 awesome-lists 1 awesome-list 1 python 1 paperwithcode 1 nlp-machine-learning 1 nlp-keywords-extraction 1 llm-training 1 llm-framework 1 llama3 1 iclr2024 1 iclr2023 1 iclr2022 1 iclr2021 1 gemmini 1 deep-learning-paper 1 reinforcement-learning 1 mlsys 1 machine-learning-systems 1 human-computer-interaction 1 human-centric-ai 1