GitHub topics: llm-test
georgian-io/LLM-Finetuning-Toolkit
Toolkit for fine-tuning, ablating and unit-testing open-source LLMs.
Language: Python - Size: 32.7 MB - Last synced at: 8 days ago - Pushed at: 8 months ago - Stars: 837 - Forks: 100

JohnSnowLabs/langtest
Deliver safe & effective language models
Language: Python - Size: 157 MB - Last synced at: 13 days ago - Pushed at: about 1 month ago - Stars: 523 - Forks: 46

uptrain-ai/uptrain
UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.
Language: Python - Size: 36.9 MB - Last synced at: 20 days ago - Pushed at: 10 months ago - Stars: 2,265 - Forks: 198

athina-ai/athina-sdk
LLM Testing SDK that helps you write and run tests to monitor your LLM app in production
Language: Python - Size: 119 KB - Last synced at: 15 days ago - Pushed at: over 1 year ago - Stars: 130 - Forks: 1

pyladiesams/eval-llm-based-apps-jan2025
Create an evaluation framework for your LLM based app. Incorporate it into your test suite. Lay the monitoring foundation.
Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: 28 days ago - Pushed at: about 1 month ago - Stars: 7 - Forks: 5

prompt-foundry/typescript-sdk
The prompt engineering, prompt management, and prompt evaluation tool for TypeScript, JavaScript, and NodeJS.
Language: TypeScript - Size: 20.9 MB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 6 - Forks: 1

levitation-opensource/Manipulative-Expression-Recognition
MER is a software that identifies and highlights manipulative communication in text from human conversations and AI-generated responses. MER benchmarks language models for manipulative expressions, fostering development of transparency and safety in AI. It also supports manipulation victims by detecting manipulative patterns in human communication.
Language: HTML - Size: 8.54 MB - Last synced at: 9 days ago - Pushed at: 10 months ago - Stars: 13 - Forks: 3

prompt-foundry/go-sdk
The prompt engineering, prompt management, and prompt evaluation tool for Go.
Size: 1000 Bytes - Last synced at: 3 months ago - Pushed at: 12 months ago - Stars: 1 - Forks: 0

awesome-software/nlptest Fork of JohnSnowLabs/langtest
Deliver safe & effective language models
Size: 106 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

Coldwave96/LLM-Sec-Evaluation
Scripts for evaluating LLM security abilities.
Language: Python - Size: 393 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

awesome-software/promptfoo Fork of promptfoo/promptfoo
Test your prompts. Evaluate and compare LLM outputs, catch regressions, and improve prompt quality.
Size: 961 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0
