An open API service providing repository metadata for many open source software ecosystems.

Topic: "llm-as-evaluator"

prometheus-eval/prometheus-eval

Evaluate your LLM's response with Prometheus and GPT4 💯

Language: Python - Size: 15.1 MB - Last synced at: 23 days ago - Pushed at: about 2 months ago - Stars: 946 - Forks: 54

JohnSnowLabs/langtest

Deliver safe & effective language models

Language: Python - Size: 157 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 523 - Forks: 46

IAAR-Shanghai/xFinder

[ICLR 2025] xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation

Language: Python - Size: 1.36 MB - Last synced at: 26 days ago - Pushed at: 4 months ago - Stars: 169 - Forks: 7

KID-22/LLM-IR-Bias-Fairness-Survey

This is the repo for the survey of Bias and Fairness in IR with LLMs.

Size: 919 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 52 - Forks: 3

minnesotanlp/cobbler

Code and data for ACL ARR 2024 paper "Benchmarking Cognitive Biases in Large Language Models as Evaluators"

Language: Jupyter Notebook - Size: 3.92 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 13 - Forks: 1

HillPhelmuth/LlmAsJudgeEvalPlugins

LLM-as-judge evals as Semantic Kernel Plugins

Language: C# - Size: 2.04 MB - Last synced at: about 6 hours ago - Pushed at: about 1 month ago - Stars: 7 - Forks: 1

djokester/groqeval

Use groq for evaluations

Language: Python - Size: 98.6 KB - Last synced at: 6 days ago - Pushed at: 11 months ago - Stars: 2 - Forks: 0

trustyai-explainability/vllm_judge

A tiny, lightweight library for LLM-as-a-Judge evaluations on vLLM-hosted models.

Language: Python - Size: 765 KB - Last synced at: about 8 hours ago - Pushed at: about 9 hours ago - Stars: 1 - Forks: 1

Kakz/prometheus-llm

PrometheusLLM is a unique transformer architecture inspired by dignity and recursion. This project aims to explore new frontiers in AI research and welcomes contributions from the community. 🐙🌟

Language: Python - Size: 257 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

Non-NeutralZero/LLM-EvalSys

automated evaluation of llm generated responses on aws

Language: Python - Size: 36.1 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

rafaelsandroni/antibodies

Antibodies for LLMs hallucinations (grouping LLM as a judge, NLI, reward models)

Language: Python - Size: 3.91 KB - Last synced at: 7 days ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0