An open API service providing repository metadata for many open source software ecosystems.

Topic: "lm-evaluation"

IAAR-Shanghai/xFinder

[ICLR 2025] xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation

Language: Python - Size: 1.36 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 169 - Forks: 7

bethgelab/CiteME

CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.

Language: Python - Size: 283 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 35 - Forks: 3

hitz-zentroa/latxa

Latxa: An Open Language Model and Evaluation Suite for Basque

Language: Shell - Size: 27.4 MB - Last synced at: 3 days ago - Pushed at: 12 months ago - Stars: 28 - Forks: 0

RulinShao/RAG-evaluation-harnesses

An evaluation suite for Retrieval-Augmented Generation (RAG).

Language: Python - Size: 1.78 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 18 - Forks: 2

SprykAI/lm-evaluation-harness Fork of huggingface/lm-evaluation-harness

Fork of lm-evaluation-harness. Includes MATH benchmark fix

Language: Python - Size: 22.5 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0