An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: llm-reasoning

reasoning-survey/Awesome-Reasoning-Foundation-Models

✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models

Size: 7.42 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 588 - Forks: 57

inclusionAI/AReaL

Distributed RL System for LLM Reasoning

Language: Python - Size: 9.98 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1,743 - Forks: 85

IAAR-Shanghai/Awesome-Attention-Heads

An awesome repository & A comprehensive survey on interpretability of LLM attention heads.

Language: TeX - Size: 6.07 MB - Last synced at: 1 day ago - Pushed at: 4 months ago - Stars: 351 - Forks: 12

Gen-Verse/MMaDA

MMaDA - Open-Sourced Multimodal Large Diffusion Language Models

Language: Python - Size: 129 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,055 - Forks: 47

yinizhilian/ICLR2025-Papers-with-Code

历年ICLR论文和开源项目合集,包含ICLR2021、ICLR2022、ICLR2023、ICLR2024、ICLR2025.

Size: 1.47 MB - Last synced at: 5 days ago - Pushed at: 3 months ago - Stars: 327 - Forks: 17

BennyTMT/GAMETime

Language: Python - Size: 5.39 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 8 - Forks: 0

matdev83/llm-aot-process

This project is designed for Answer-then-Think (AoT) processing with Large Language Models (LLMs). It provides a flexible framework to orchestrate complex reasoning tasks by breaking them down into iterative steps, managing LLM interactions, and dynamically adapting based on problem complexity and resource constraints.

Language: Python - Size: 1.67 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 1 - Forks: 0

nl4opt/ORQA

[AAAI 2025] ORQA is a new QA benchmark designed to assess the reasoning capabilities of LLMs in a specialized technical domain of Operations Research. The benchmark evaluates whether LLMs can emulate the knowledge and reasoning skills of OR experts when presented with complex optimization modeling tasks.

Language: Python - Size: 2.49 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 36 - Forks: 0

sbhambr1/Trace_Check_QA

Code for Invesitgating Trace-based Knowledge Distillation on Question-Answering

Language: Python - Size: 78.3 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

inclusionAI/Ling

Ling is a MoE LLM provided and open-sourced by InclusionAI.

Language: Python - Size: 3.36 MB - Last synced at: 16 days ago - Pushed at: about 1 month ago - Stars: 157 - Forks: 15

mangopy/SearchLM

Official code for "Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers"

Language: Python - Size: 2.25 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 49 - Forks: 0

twittymatteoscott/CryptoFormalEval

We introduce a benchmark for testing how well LLMs can find vulnerabilities in cryptographic protocols. By combining LLMs with symbolic reasoning tools like Tamarin, we aim to improve the efficiency and thoroughness of protocol analysis, paving the way for future AI-powered cybersecurity defenses.

Size: 2.93 KB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

Trae1ounG/Neural_Incompatibility

Official code for ACL'25 Main: "Neural Incompatibility: The Unbridgeable Gap of Cross-Scale Parametric Knowledge Transfer in Large Language Models"

Language: Python - Size: 1.45 MB - Last synced at: 16 days ago - Pushed at: 26 days ago - Stars: 6 - Forks: 0

soulkeeperc5/CryptoFormalEval

We introduce a benchmark for testing how well LLMs can find vulnerabilities in cryptographic protocols. By combining LLMs with symbolic reasoning tools like Tamarin, we aim to improve the efficiency and thoroughness of protocol analysis, paving the way for future AI-powered cybersecurity defenses.

Size: 0 Bytes - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 0 - Forks: 0

lordlord0whitefox/CryptoFormalEval

We introduce a benchmark for testing how well LLMs can find vulnerabilities in cryptographic protocols. By combining LLMs with symbolic reasoning tools like Tamarin, we aim to improve the efficiency and thoroughness of protocol analysis, paving the way for future AI-powered cybersecurity defenses.

Size: 2.93 KB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 0 - Forks: 0

leekerstopme/CryptoFormalEval-n6

We introduce a benchmark for testing how well LLMs can find vulnerabilities in cryptographic protocols. By combining LLMs with symbolic reasoning tools like Tamarin, we aim to improve the efficiency and thoroughness of protocol analysis, paving the way for future AI-powered cybersecurity defenses.

Size: 2.93 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0

PengKuang/wasp-ai-ml-25vt

Assignment for the WASP course AI & ML - module 1

Language: Jupyter Notebook - Size: 590 KB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 2 - Forks: 0

MozerWang/AMPO

[arxiv: 2505.02156] Adaptive Thinking via Mode Policy Optimization for Social Language Agents

Language: Python - Size: 9.54 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 17 - Forks: 2

Cristian-Curaba/CryptoFormalEval

We introduce a benchmark for testing how well LLMs can find vulnerabilities in cryptographic protocols. By combining LLMs with symbolic reasoning tools like Tamarin, we aim to improve the efficiency and thoroughness of protocol analysis, paving the way for future AI-powered cybersecurity defenses.

Language: Haskell - Size: 7.43 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 5 - Forks: 1

YangLing0818/buffer-of-thought-llm

[NeurIPS 2024 Spotlight] Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

Language: Python - Size: 1.07 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 633 - Forks: 61

bruno686/Awesome-RL-based-LLM-Reasoning

Awesome RL-based LLM Reasoning

Size: 57.6 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 472 - Forks: 24

AStroCvijo/react_reproduction

Reproduction of ICLR 2023 paper "ReAct: Synergizing Reasoning and Acting in Language Models".

Language: Python - Size: 5.94 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 0

AmanPriyanshu/My-Personal-Model-Picks

A curated showcase of small, on-premise models with great utility, enabling privacy-centered deployment and easy adaptation to bio-inspired use cases.

Language: HTML - Size: 65.4 KB - Last synced at: 4 days ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

CodeEval-Pro/CodeEval-Pro

Official repo for "HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation Task"

Language: Python - Size: 4.1 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 26 - Forks: 2

ethicalabs-ai/ouroboros

Self-Improving LLMs Through Iterative Refinement

Language: Python - Size: 429 KB - Last synced at: 5 days ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

YangLing0818/SuperCorrect-llm

[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction

Language: Python - Size: 3.69 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 62 - Forks: 4

tsinghua-fib-lab/SmartAgent

The official repository of "SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World".

Size: 4.68 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 24 - Forks: 1

pittisl/PhyT2V

official code repo of CVPR 2025 paper PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation

Language: Python - Size: 116 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 11 - Forks: 1

tegridydev/mixture-of-persona-research

A “Mixture of Perspectives” Framework for Ethical AI

Size: 12.7 KB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

hriaz17/SayLessRAG

Code for the paper: "Say Less, Mean More: Leveraging Pragmatics in Retrieval-Augmented Generation"

Size: 4.88 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

trangdf/Creating-sample-means-for-measurement-standards-of-intelligence

Creating sample means for measurement standards of intelligence

Size: 1000 Bytes - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

ogrnv/Creating-sample-means-for-measurement-standards-of-intelligence

Creating sample means for measurement standards of intelligence

Language: C - Size: 153 KB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

bowen-upenn/llm_token_bias

[EMNLP 2024] This is the official implementation of the paper "A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners" in PyTorch.

Language: Python - Size: 57.4 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 9 - Forks: 1

UKPLab/emnlp2024-code-prompting

Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs. EMNLP 2024

Language: Python - Size: 46.6 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 15 - Forks: 5

Related Keywords
llm-reasoning 34 llm 17 llms 7 large-language-models 6 llm-benchmarking 6 vulnerability-detection 5 llm-based-agents 5 communication-protocol 5 cryptography 5 evaluation 5 reasoning 3 reasoning-language-models 3 retrieval-augmented-generation 3 gpt 3 ai 2 artificial-general-intelligence 2 confidence-intervals 2 intelligence 2 iq 2 measurement-standard 2 agi 2 llm-agent 2 llm-evaluation 2 machine-learning 2 open-source 2 python 2 rag 2 reasoning-agent 2 llm-inference 2 rl 2 tests 2 testing-tools 2 chain-of-thought 2 testing 2 statistics 2 statistical-analysis 2 sample-mean 2 randomly-generated 2 r 2 transformer 2 psychometics 2 diffusion-models 2 dataset-generation 1 deepseek-r1 1 llms-reasoning 1 metacognition 1 synthetic-data 1 synthetic-dataset-generation 1 dpo 1 reflection 1 self-correction 1 llm4code 1 llm-evaluation-toolkit 1 code-generation 1 small-llm 1 models 1 llmops 1 awesome-lists 1 awesome-list 1 react-reasoning 1 program-aided-language-model 1 openai 1 conditional-reasoning 1 code-prompting 1 token-bias 1 logical-reasoning 1 information-retrieval 1 information-extraction 1 gricean-pragmatics 1 dense-retrieval 1 open-research 1 moral-machines 1 mop-ai 1 mechanistic-interpretability 1 alignment 1 ai-research 1 ai-framework 1 video-generation 1 prompt-tuning 1 cvpr2025 1 personalization 1 openai-o1 1 multi-modal 1 lvlm 1 large-language-model 1 human-computer-interaction 1 human-centric-ai 1 embodied-ai 1 post-training 1 nlp 1 language-time-series 1 language-model 1 paperwithcode 1 nlp-machine-learning 1 nlp-keywords-extraction 1 llm-training 1 llm-framework 1 llama3 1 iclr2024 1 iclr2023 1