Topic: "gpt-4-5"
dongri/openai-api-rs
OpenAI API client library for Rust (unofficial)
Language: Rust - Size: 382 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 400 - Forks: 74

lechmazur/elimination_game
A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private conversations, form alliances, and vote to eliminate each other
Size: 41.5 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 253 - Forks: 8

lechmazur/writing
This benchmark tests how well LLMs incorporate a set of 10 mandatory story elements (characters, objects, core concepts, attributes, motivations, etc.) in a short creative story
Language: Batchfile - Size: 170 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 194 - Forks: 5

lechmazur/nyt-connections
Benchmark that evaluates LLMs using 651 NYT Connections puzzles extended with extra trick words
Language: Python - Size: 4.04 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 80 - Forks: 5

lechmazur/generalization
Thematic Generalization Benchmark: measures how effectively various LLMs can infer a narrow or specific "theme" (category/rule) from a small set of examples and anti-examples, then detect which item truly fits that theme among a collection of misleading candidates.
Size: 25.1 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 44 - Forks: 1

lechmazur/step_game
Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure. A multi-player “step-race” that challenges LLMs to engage in public conversation before secretly picking a move (1, 3, or 5 steps). Whenever two or more players choose the same number, all colliding players fail to advance.
Size: 36.4 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 44 - Forks: 3

lechmazur/pgg_bench
Public Goods Game (PGG) Benchmark: Contribute & Punish is a multi-agent benchmark that tests cooperative and self-interested strategies among Large Language Models (LLMs) in a resource-sharing economic scenario. Our experiment extends the classic PGG with a punishment phase, allowing players to penalize free-riders or retaliate against others.
Size: 11.4 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 33 - Forks: 2

DarkCaster/Perpetual
LLM-driven software development helper.
Language: Go - Size: 6.92 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 9 - Forks: 1
