ecosyste.ms

Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: llm-test

Repositories

georgian-io/LLM-Finetuning-Toolkit

Toolkit for fine-tuning, ablating and unit-testing open-source LLMs.

Language: Python - Size: 32.7 MB - Last synced at: 8 days ago - Pushed at: 8 months ago - Stars: 837 - Forks: 100

JohnSnowLabs/langtest

Deliver safe & effective language models

Language: Python - Size: 157 MB - Last synced at: 13 days ago - Pushed at: about 1 month ago - Stars: 523 - Forks: 46

uptrain-ai/uptrain

UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.

Language: Python - Size: 36.9 MB - Last synced at: 20 days ago - Pushed at: 10 months ago - Stars: 2,265 - Forks: 198

athina-ai/athina-sdk

LLM Testing SDK that helps you write and run tests to monitor your LLM app in production

Language: Python - Size: 119 KB - Last synced at: 15 days ago - Pushed at: over 1 year ago - Stars: 130 - Forks: 1

pyladiesams/eval-llm-based-apps-jan2025

Create an evaluation framework for your LLM based app. Incorporate it into your test suite. Lay the monitoring foundation.

Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: 28 days ago - Pushed at: about 1 month ago - Stars: 7 - Forks: 5

prompt-foundry/typescript-sdk

The prompt engineering, prompt management, and prompt evaluation tool for TypeScript, JavaScript, and NodeJS.

Language: TypeScript - Size: 20.9 MB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 6 - Forks: 1

levitation-opensource/Manipulative-Expression-Recognition

MER is a software that identifies and highlights manipulative communication in text from human conversations and AI-generated responses. MER benchmarks language models for manipulative expressions, fostering development of transparency and safety in AI. It also supports manipulation victims by detecting manipulative patterns in human communication.

Language: HTML - Size: 8.54 MB - Last synced at: 9 days ago - Pushed at: 10 months ago - Stars: 13 - Forks: 3

prompt-foundry/go-sdk

The prompt engineering, prompt management, and prompt evaluation tool for Go.

Size: 1000 Bytes - Last synced at: 3 months ago - Pushed at: 12 months ago - Stars: 1 - Forks: 0

awesome-software/nlptest Fork of JohnSnowLabs/langtest

Deliver safe & effective language models

Size: 106 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

Coldwave96/LLM-Sec-Evaluation

Scripts for evaluating LLM security abilities.

Language: Python - Size: 393 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

awesome-software/promptfoo Fork of promptfoo/promptfoo

Test your prompts. Evaluate and compare LLM outputs, catch regressions, and improve prompt quality.

Size: 961 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Related Keywords

llm-test 11 llm 5 llm-eval 4 llmops 4 prompt-engineering 4 llm-testing 3 large-language-models 2 monitoring 2 gpt 2 nlp 2 gpt-4 2 llm-evaluation 2 mlops 2 prompt-manager 2 prompt-management 2 llm-evaluation-framework 1 benchmarking 1 llm-evaluation-metrics 1 typescript 1 prompt-testing 1 prompt-evaluation 1 open-ai 1 llm-ops 1 llm-monitoring 1 gpt-3 1 llms 1 workshop 1 chatglm2-6b 1 baichuan-13b 1 prompt-test 1 prompt-eva 1 open-api 1 golang 1 go 1 transparency 1 sentiment-classification 1 sentiment-analysis 1 psychometrics 1 prompt-injection 1 misinformation 1 manipulation 1 llm-training 1 llm-security 1 human-robot-interaction 1 human-computer-interaction 1 fraud-prevention 1 fraud-detection 1 expression-recognition 1 conversation-analytics 1 conversation-analysis 1 benchmarks 1 benchmark-framework 1 artificial-intelligence 1 ai-testing 1 ai-safety 1 zephyr 1 unit-testing 1 summarization 1 redpajama 1 qlora 1 nlp-machine-learning 1 mistral-7b 1 lora 1 llama2 1 flan-t5 1 finetuning 1 fine-tuning 1 falcon 1 classification 1 ablation-study 1 llm-evals 1 testing-tools 1 aiops 1 root-cause-analysis 1 openai-evals 1 machine-learning 1 llm-prompting 1 jailbreak-detection 1 hallucination-detection 1 experimentation 1 evaluation 1 autoevaluation 1 trustworthy-ai 1 responsible-ai 1 model-assessment 1 ml-testing 1 ml-safety 1 llm-evaluation-toolkit 1 llm-as-evaluator 1 ethics-in-ai 1