GitHub topics: llm-evaluation-metrics
Fbxfax/llm-confidence-scorer
A set of auxiliary systems designed to provide a measure of estimated confidence for the outputs generated by Large Language Models.
Language: Python - Size: 96.7 KB - Last synced at: about 15 hours ago - Pushed at: about 16 hours ago - Stars: 0 - Forks: 0

confident-ai/deepeval
The LLM Evaluation Framework
Language: Python - Size: 78 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 6,116 - Forks: 533

ronniross/llm-confidence-scorer
A set of auxiliary systems designed to provide a measure of estimated confidence for the outputs generated by Large Language Models.
Language: Python - Size: 0 Bytes - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

cvs-health/langfair
LangFair is a Python library for conducting use-case level LLM bias and fairness assessments
Language: Python - Size: 30.1 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 201 - Forks: 32

locuslab/open-unlearning
A one-stop repository for large language model (LLM) unlearning. Supports TOFU, MUSE and is an easily extensible framework for new datasets, evaluations, methods, and other benchmarks.
Language: Python - Size: 15.9 MB - Last synced at: 12 days ago - Pushed at: 13 days ago - Stars: 217 - Forks: 49

zhuohaoyu/KIEval
[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
Language: Python - Size: 10.6 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 36 - Forks: 2

pyladiesams/eval-llm-based-apps-jan2025
Create an evaluation framework for your LLM based app. Incorporate it into your test suite. Lay the monitoring foundation.
Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: 22 days ago - Pushed at: 4 months ago - Stars: 7 - Forks: 5

ritwickbhargav80/quick-llm-model-evaluations
This repo is for an streamlit application that provides a user-friendly interface for evaluating large language models (LLMs) using the beyondllm package.
Language: Python - Size: 47.9 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0
