An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: llm-evaluation-metrics

confident-ai/deepeval

The LLM Evaluation Framework

Language: Python - Size: 84.6 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 8,321 - Forks: 719

attogram/ollama-multirun

A bash shell script to run a single prompt against any or all of your locally installed ollama models, saving the output and performance statistics as easily navigable web pages.

Language: Shell - Size: 4.02 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 6 - Forks: 1

cvs-health/langfair

LangFair is a Python library for conducting use-case level LLM bias and fairness assessments

Language: Python - Size: 30.7 MB - Last synced at: 6 days ago - Pushed at: 10 days ago - Stars: 215 - Forks: 33

ronniross/llm-confidence-scorer

A set of auxiliary systems designed to provide a measure of estimated confidence for the outputs generated by Large Language Models.

Language: Python - Size: 143 KB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 2 - Forks: 0

locuslab/open-unlearning

The one-stop repository for large language model (LLM) unlearning. Supports TOFU, MUSE, WMDP, and many unlearning methods. All features: benchmarks, methods, evaluations, models etc. are easily extensible.

Language: Python - Size: 15.9 MB - Last synced at: 29 days ago - Pushed at: 30 days ago - Stars: 262 - Forks: 62

nhsengland/evalsense

Tools for systematic large language model evaluations

Language: Python - Size: 877 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

pyladiesams/eval-llm-based-apps-jan2025

Create an evaluation framework for your LLM based app. Incorporate it into your test suite. Lay the monitoring foundation.

Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 7 - Forks: 5

Fbxfax/llm-confidence-scorer

A set of auxiliary systems designed to provide a measure of estimated confidence for the outputs generated by Large Language Models.

Language: Python - Size: 96.7 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

zhuohaoyu/KIEval

[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

Language: Python - Size: 10.6 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 36 - Forks: 2

ritwickbhargav80/quick-llm-model-evaluations

This repo is for an streamlit application that provides a user-friendly interface for evaluating large language models (LLMs) using the beyondllm package.

Language: Python - Size: 47.9 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0