lm-evaluation | Topic | Ecosyste.ms: Repos

Topic: "lm-evaluation"

[ICLR 2025] xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation

Language: Python - Size: 1.36 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 169 - Forks: 7

CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.

Language: Python - Size: 283 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 35 - Forks: 3

Latxa: An Open Language Model and Evaluation Suite for Basque

Language: Shell - Size: 27.4 MB - Last synced at: 3 days ago - Pushed at: 12 months ago - Stars: 28 - Forks: 0

An evaluation suite for Retrieval-Augmented Generation (RAG).

Language: Python - Size: 1.78 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 18 - Forks: 2

Fork of lm-evaluation-harness. Includes MATH benchmark fix

Language: Python - Size: 22.5 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0