GitHub topics: llm-evaluation-metrics
confident-ai/deepeval
The LLM Evaluation Framework
Language: Python - Size: 84.6 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 8,321 - Forks: 719

attogram/ollama-multirun
A bash shell script to run a single prompt against any or all of your locally installed ollama models, saving the output and performance statistics as easily navigable web pages.
Language: Shell - Size: 4.02 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 6 - Forks: 1

cvs-health/langfair
LangFair is a Python library for conducting use-case level LLM bias and fairness assessments
Language: Python - Size: 30.7 MB - Last synced at: 6 days ago - Pushed at: 10 days ago - Stars: 215 - Forks: 33

ronniross/llm-confidence-scorer
A set of auxiliary systems designed to provide a measure of estimated confidence for the outputs generated by Large Language Models.
Language: Python - Size: 143 KB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 2 - Forks: 0

locuslab/open-unlearning
The one-stop repository for large language model (LLM) unlearning. Supports TOFU, MUSE, WMDP, and many unlearning methods. All features: benchmarks, methods, evaluations, models etc. are easily extensible.
Language: Python - Size: 15.9 MB - Last synced at: 29 days ago - Pushed at: 30 days ago - Stars: 262 - Forks: 62

nhsengland/evalsense
Tools for systematic large language model evaluations
Language: Python - Size: 877 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

pyladiesams/eval-llm-based-apps-jan2025
Create an evaluation framework for your LLM based app. Incorporate it into your test suite. Lay the monitoring foundation.
Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 7 - Forks: 5

Fbxfax/llm-confidence-scorer
A set of auxiliary systems designed to provide a measure of estimated confidence for the outputs generated by Large Language Models.
Language: Python - Size: 96.7 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

zhuohaoyu/KIEval
[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
Language: Python - Size: 10.6 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 36 - Forks: 2

ritwickbhargav80/quick-llm-model-evaluations
This repo is for an streamlit application that provides a user-friendly interface for evaluating large language models (LLMs) using the beyondllm package.
Language: Python - Size: 47.9 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0
