An open API service providing repository metadata for many open source software ecosystems.

Topic: "llm-evaluation-metrics"

confident-ai/deepeval

The LLM Evaluation Framework

Language: Python - Size: 78 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 6,116 - Forks: 533

locuslab/open-unlearning

A one-stop repository for large language model (LLM) unlearning. Supports TOFU, MUSE and is an easily extensible framework for new datasets, evaluations, methods, and other benchmarks.

Language: Python - Size: 15.9 MB - Last synced at: 12 days ago - Pushed at: 13 days ago - Stars: 217 - Forks: 49

cvs-health/langfair

LangFair is a Python library for conducting use-case level LLM bias and fairness assessments

Language: Python - Size: 30.1 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 201 - Forks: 32

zhuohaoyu/KIEval

[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

Language: Python - Size: 10.6 MB - Last synced at: 30 days ago - Pushed at: 10 months ago - Stars: 36 - Forks: 2

pyladiesams/eval-llm-based-apps-jan2025

Create an evaluation framework for your LLM based app. Incorporate it into your test suite. Lay the monitoring foundation.

Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: 21 days ago - Pushed at: 4 months ago - Stars: 7 - Forks: 5

Fbxfax/llm-confidence-scorer

A set of auxiliary systems designed to provide a measure of estimated confidence for the outputs generated by Large Language Models.

Language: Python - Size: 96.7 KB - Last synced at: about 6 hours ago - Pushed at: about 7 hours ago - Stars: 0 - Forks: 0

ronniross/llm-confidence-scorer

A set of auxiliary systems designed to provide a measure of estimated confidence for the outputs generated by Large Language Models.

Language: Python - Size: 0 Bytes - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

ritwickbhargav80/quick-llm-model-evaluations

This repo is for an streamlit application that provides a user-friendly interface for evaluating large language models (LLMs) using the beyondllm package.

Language: Python - Size: 47.9 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0