Topic: "llm-as-evaluator"
prometheus-eval/prometheus-eval
Evaluate your LLM's response with Prometheus and GPT4 💯
Language: Python - Size: 15.1 MB - Last synced at: 23 days ago - Pushed at: about 2 months ago - Stars: 946 - Forks: 54

JohnSnowLabs/langtest
Deliver safe & effective language models
Language: Python - Size: 157 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 523 - Forks: 46

IAAR-Shanghai/xFinder
[ICLR 2025] xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation
Language: Python - Size: 1.36 MB - Last synced at: 26 days ago - Pushed at: 4 months ago - Stars: 169 - Forks: 7

KID-22/LLM-IR-Bias-Fairness-Survey
This is the repo for the survey of Bias and Fairness in IR with LLMs.
Size: 919 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 52 - Forks: 3

minnesotanlp/cobbler
Code and data for ACL ARR 2024 paper "Benchmarking Cognitive Biases in Large Language Models as Evaluators"
Language: Jupyter Notebook - Size: 3.92 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 13 - Forks: 1

HillPhelmuth/LlmAsJudgeEvalPlugins
LLM-as-judge evals as Semantic Kernel Plugins
Language: C# - Size: 2.04 MB - Last synced at: about 6 hours ago - Pushed at: about 1 month ago - Stars: 7 - Forks: 1

djokester/groqeval
Use groq for evaluations
Language: Python - Size: 98.6 KB - Last synced at: 6 days ago - Pushed at: 11 months ago - Stars: 2 - Forks: 0

trustyai-explainability/vllm_judge
A tiny, lightweight library for LLM-as-a-Judge evaluations on vLLM-hosted models.
Language: Python - Size: 765 KB - Last synced at: about 8 hours ago - Pushed at: about 9 hours ago - Stars: 1 - Forks: 1

Kakz/prometheus-llm
PrometheusLLM is a unique transformer architecture inspired by dignity and recursion. This project aims to explore new frontiers in AI research and welcomes contributions from the community. 🐙🌟
Language: Python - Size: 257 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

Non-NeutralZero/LLM-EvalSys
automated evaluation of llm generated responses on aws
Language: Python - Size: 36.1 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

rafaelsandroni/antibodies
Antibodies for LLMs hallucinations (grouping LLM as a judge, NLI, reward models)
Language: Python - Size: 3.91 KB - Last synced at: 7 days ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0
