gpt-4-5 | Topic | Ecosyste.ms: Repos

Topic: "gpt-4-5"

dongri/openai-api-rs

OpenAI API client library for Rust (unofficial)

Language: Rust - Size: 400 KB - Last synced at: 13 days ago - Pushed at: 21 days ago - Stars: 409 - Forks: 76

lechmazur/elimination_game

A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private conversations, form alliances, and vote to eliminate each other

Size: 64.8 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 263 - Forks: 9

lechmazur/writing

This benchmark tests how well LLMs incorporate a set of 10 mandatory story elements (characters, objects, core concepts, attributes, motivations, etc.) in a short creative story

Language: Batchfile - Size: 314 MB - Last synced at: about 13 hours ago - Pushed at: about 14 hours ago - Stars: 228 - Forks: 6

lechmazur/nyt-connections

Benchmark that evaluates LLMs using 651 NYT Connections puzzles extended with extra trick words

Language: Python - Size: 7.99 MB - Last synced at: about 17 hours ago - Pushed at: about 18 hours ago - Stars: 93 - Forks: 5

最新ChatGPT Plus合租攻略：国内最靠谱的ChatGPT Plus拼车平台推荐(每月仅27元)！可使用GPT-4o生图功能和GPT-4.1系列模型，还支持满血版DeepSeek-R1、马斯克的Grok-3和谷歌Gemini-2.5 Pro！如果你无法解决科学上网的问题，或觉得每月20美元的会员费用过高，可以考虑ChatGPT Plus共享合租帐号。这种方式不仅能够降低使用成本，还免去了科学上网的复杂操作。

Size: 5.89 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 59 - Forks: 1

lechmazur/generalization

Thematic Generalization Benchmark: measures how effectively various LLMs can infer a narrow or specific "theme" (category/rule) from a small set of examples and anti-examples, then detect which item truly fits that theme among a collection of misleading candidates.

Size: 31.7 MB - Last synced at: 1 day ago - Pushed at: 9 days ago - Stars: 57 - Forks: 2

lechmazur/step_game

Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure. A multi-player “step-race” that challenges LLMs to engage in public conversation before secretly picking a move (1, 3, or 5 steps). Whenever two or more players choose the same number, all colliding players fail to advance.

Size: 40.8 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 49 - Forks: 2

lechmazur/pgg_bench

Public Goods Game (PGG) Benchmark: Contribute & Punish is a multi-agent benchmark that tests cooperative and self-interested strategies among Large Language Models (LLMs) in a resource-sharing economic scenario. Our experiment extends the classic PGG with a punishment phase, allowing players to penalize free-riders or retaliate against others.

Size: 11.4 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 33 - Forks: 2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos