GitHub topics: llm-testing

Repositories

raga-ai-hub/RagaAI-Catalyst

Python SDK for Agent AI Observability, Monitoring and Evaluation Framework. Includes features like agent, llm and tools tracing, debugging multi-agentic system, self-hosted dashboard and advanced analytics with timeline and execution graph view

Language: Python - Size: 55.8 MB - Last synced at: about 5 hours ago - Pushed at: 6 days ago - Stars: 16,191 - Forks: 3,780

rimironenko/rostcamp

Language: Python - Size: 9.66 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

vincentkoc/tiny_qa_benchmark_pp

Tiny QA Benchmark++ a micro-benchmark suite (52-item gold + on-demand multilingual synthetic packs), generator CLI, and CI-ready eval harness for ultra-fast LLM smoke-testing & regression-catching.

Language: Python - Size: 306 KB - Last synced at: 3 days ago - Pushed at: 19 days ago - Stars: 34 - Forks: 0

LLAMATOR-Core/llamator

Framework for testing vulnerabilities of large language models (LLM).

Language: Python - Size: 4.31 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 114 - Forks: 9

JohnSnowLabs/langtest

Deliver safe & effective language models

Language: Python - Size: 157 MB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 523 - Forks: 46

sandy-sp/ai-reply-index

A community-driven archive of AI prompts and responses. Log, compare, and contribute structured examples to build a searchable public prompt-response database.

Language: Python - Size: 8.03 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

dr-gareth-roberts/LLM-Dev

Python Tools for Developing with LLMs (cloud & offline)

Language: Python - Size: 286 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

JohnRitchie/qa-llm-guard

Language: Python - Size: 23.4 KB - Last synced at: 20 days ago - Pushed at: 21 days ago - Stars: 2 - Forks: 0

Leftinant/tiny_qa_benchmark_pp

Tiny QA Benchmark++ a micro-benchmark suite (52-item gold + on-demand multilingual synthetic packs), generator CLI, and CI-ready eval harness for ultra-fast LLM smoke-testing & regression-catching.

Size: 1.95 KB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

rhesis-ai/rhesis-sdk

Open-source test generation SDK for LLM applications. Access curated test sets. Build context-specific test sets and collaborate with subject matter experts.

Language: Python - Size: 420 KB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 18 - Forks: 0

ssilwal29/api-ninja

API Ninja simplifies API testing by allowing users to define test flows in plain English.

Language: Python - Size: 1.45 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 0 - Forks: 0

Addepto/contextcheck

MIT-licensed Framework for LLMs, RAGs, Chatbots testing. Configurable via YAML and integrable into CI pipelines for automated testing.

Language: Python - Size: 464 KB - Last synced at: 29 days ago - Pushed at: 6 months ago - Stars: 67 - Forks: 9

pyladiesams/eval-llm-based-apps-jan2025

Create an evaluation framework for your LLM based app. Incorporate it into your test suite. Lay the monitoring foundation.

Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: 26 days ago - Pushed at: about 1 month ago - Stars: 7 - Forks: 5

wizenheimer/periscope

LLM Performance Testing | K6 + Grafana + InfluxDB | A tiny toolkit for load testing and benchmarking OpenAI-like inference endpoints using K6 + Grafana + InfluxDB

Language: JavaScript - Size: 563 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0