GitHub topics: llm-testing
raga-ai-hub/RagaAI-Catalyst
Python SDK for Agent AI Observability, Monitoring and Evaluation Framework. Includes features like agent, llm and tools tracing, debugging multi-agentic system, self-hosted dashboard and advanced analytics with timeline and execution graph view
Language: Python - Size: 55.8 MB - Last synced at: about 5 hours ago - Pushed at: 6 days ago - Stars: 16,191 - Forks: 3,780

rimironenko/rostcamp
Language: Python - Size: 9.66 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

vincentkoc/tiny_qa_benchmark_pp
Tiny QA Benchmark++ a micro-benchmark suite (52-item gold + on-demand multilingual synthetic packs), generator CLI, and CI-ready eval harness for ultra-fast LLM smoke-testing & regression-catching.
Language: Python - Size: 306 KB - Last synced at: 3 days ago - Pushed at: 19 days ago - Stars: 34 - Forks: 0

LLAMATOR-Core/llamator
Framework for testing vulnerabilities of large language models (LLM).
Language: Python - Size: 4.31 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 114 - Forks: 9

JohnSnowLabs/langtest
Deliver safe & effective language models
Language: Python - Size: 157 MB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 523 - Forks: 46

sandy-sp/ai-reply-index
A community-driven archive of AI prompts and responses. Log, compare, and contribute structured examples to build a searchable public prompt-response database.
Language: Python - Size: 8.03 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

dr-gareth-roberts/LLM-Dev
Python Tools for Developing with LLMs (cloud & offline)
Language: Python - Size: 286 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

JohnRitchie/qa-llm-guard
Language: Python - Size: 23.4 KB - Last synced at: 20 days ago - Pushed at: 21 days ago - Stars: 2 - Forks: 0

Leftinant/tiny_qa_benchmark_pp
Tiny QA Benchmark++ a micro-benchmark suite (52-item gold + on-demand multilingual synthetic packs), generator CLI, and CI-ready eval harness for ultra-fast LLM smoke-testing & regression-catching.
Size: 1.95 KB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

rhesis-ai/rhesis-sdk
Open-source test generation SDK for LLM applications. Access curated test sets. Build context-specific test sets and collaborate with subject matter experts.
Language: Python - Size: 420 KB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 18 - Forks: 0

ssilwal29/api-ninja
API Ninja simplifies API testing by allowing users to define test flows in plain English.
Language: Python - Size: 1.45 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 0 - Forks: 0

Addepto/contextcheck
MIT-licensed Framework for LLMs, RAGs, Chatbots testing. Configurable via YAML and integrable into CI pipelines for automated testing.
Language: Python - Size: 464 KB - Last synced at: 29 days ago - Pushed at: 6 months ago - Stars: 67 - Forks: 9

pyladiesams/eval-llm-based-apps-jan2025
Create an evaluation framework for your LLM based app. Incorporate it into your test suite. Lay the monitoring foundation.
Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: 26 days ago - Pushed at: about 1 month ago - Stars: 7 - Forks: 5

wizenheimer/periscope
LLM Performance Testing | K6 + Grafana + InfluxDB | A tiny toolkit for load testing and benchmarking OpenAI-like inference endpoints using K6 + Grafana + InfluxDB
Language: JavaScript - Size: 563 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

neonxploit/Dragon-Glitch---NeonXploit-Audit-v1.0-
Red-team audit on deepseek AI by lala aka NeonXploit (operation dragon Glitch)
Size: 2.28 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

MirrorLoop/mirrorloop-core
Official public release of MirrorLoop Core (v1.3 – April 2025)
Size: 1.39 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

borisveis/LLMTesting
LLM Testing with gpt4all
Language: Python - Size: 24.4 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

ModelPulse/BreakYourLLM
Break Your LLM before your users do!
Language: Python - Size: 234 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

prompt-foundry/go-sdk
The prompt engineering, prompt management, and prompt evaluation tool for Go.
Size: 1000 Bytes - Last synced at: 3 months ago - Pushed at: 12 months ago - Stars: 1 - Forks: 0
