GitHub topics: llm-evaluation-metrics

Repositories

confident-ai/deepeval

The LLM Evaluation Framework

Language: Python - Size: 84.6 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 8,321 - Forks: 719

attogram/ollama-multirun

A bash shell script to run a single prompt against any or all of your locally installed ollama models, saving the output and performance statistics as easily navigable web pages.

Language: Shell - Size: 4.02 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 6 - Forks: 1

cvs-health/langfair

LangFair is a Python library for conducting use-case level LLM bias and fairness assessments

Language: Python - Size: 30.7 MB - Last synced at: 6 days ago - Pushed at: 10 days ago - Stars: 215 - Forks: 33

ronniross/llm-confidence-scorer

A set of auxiliary systems designed to provide a measure of estimated confidence for the outputs generated by Large Language Models.

Language: Python - Size: 143 KB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 2 - Forks: 0

The one-stop repository for large language model (LLM) unlearning. Supports TOFU, MUSE, WMDP, and many unlearning methods. All features: benchmarks, methods, evaluations, models etc. are easily extensible.

Language: Python - Size: 15.9 MB - Last synced at: 29 days ago - Pushed at: 30 days ago - Stars: 262 - Forks: 62

nhsengland/evalsense

Tools for systematic large language model evaluations

Language: Python - Size: 877 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

pyladiesams/eval-llm-based-apps-jan2025

Create an evaluation framework for your LLM based app. Incorporate it into your test suite. Lay the monitoring foundation.

Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 7 - Forks: 5

Fbxfax/llm-confidence-scorer

A set of auxiliary systems designed to provide a measure of estimated confidence for the outputs generated by Large Language Models.

Language: Python - Size: 96.7 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

zhuohaoyu/KIEval

[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

Language: Python - Size: 10.6 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 36 - Forks: 2

ritwickbhargav80/quick-llm-model-evaluations

This repo is for an streamlit application that provides a user-friendly interface for evaluating large language models (LLMs) using the beyondllm package.

Language: Python - Size: 47.9 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Related Keywords

llm-evaluation-metrics 10 llm-evaluation 7 llm-evaluation-framework 7 llm 6 llms 5 llm-evaluation-toolkit 4 llms-reasoning 2 llms-evalution 2 llms-efficency 2 llms-benchmarking 2 llm-training 2 datasets 2 dataset 2 evaluation-framework 2 evaluation-metrics 2 ai 2 llm-eval 2 llm-unlearning 1 membership-inference 1 membership-inference-attacks 1 open-source 1 privacy-protection 1 right-to-be-forgotten 1 unlearning 1 llm-benchmarking 1 llm-evals 1 llm-monitoring 1 llm-test 1 llm-testing 1 llmops 1 workshop 1 acl2024 1 explainable-ai 1 machine-learning 1 beyondllm 1 retrieval-augmented-generation 1 streamlit 1 ai-safety 1 artificial-intelligence 1 bias 1 bias-detection 1 ethical-ai 1 fairness 1 fairness-ai 1 fairness-ml 1 fairness-testing 1 large-language-models 1 static-site-generator 1 python 1 responsible-ai 1 ollama-interface 1 ollama-app 1 ollama 1 bash-script 1 ai-evaluation-tools 1 benchmarks 1 llm-privacy 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos