An open API service providing repository metadata for many open source software ecosystems.

Topic: "llm-testing"

raga-ai-hub/RagaAI-Catalyst

Python SDK for Agent AI Observability, Monitoring and Evaluation Framework. Includes features like agent, llm and tools tracing, debugging multi-agentic system, self-hosted dashboard and advanced analytics with timeline and execution graph view

Language: Python - Size: 55.8 MB - Last synced at: 10 days ago - Pushed at: 13 days ago - Stars: 16,179 - Forks: 3,763

JohnSnowLabs/langtest

Deliver safe & effective language models

Language: Python - Size: 158 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 526 - Forks: 47

LLAMATOR-Core/llamator

Framework for testing vulnerabilities of large language models (LLM).

Language: Python - Size: 4.32 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 125 - Forks: 11

Addepto/contextcheck

MIT-licensed Framework for LLMs, RAGs, Chatbots testing. Configurable via YAML and integrable into CI pipelines for automated testing.

Language: Python - Size: 464 KB - Last synced at: 25 days ago - Pushed at: 7 months ago - Stars: 72 - Forks: 9

rhesis-ai/rhesis-sdk

Open-source test generation SDK for LLM applications. Access curated test sets. Build context-specific test sets and collaborate with subject matter experts.

Language: Python - Size: 420 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 18 - Forks: 0

vincentkoc/tiny_qa_benchmark_pp

Tiny QA Benchmark++ a micro-benchmark suite (52-item gold + on-demand multilingual synthetic packs), generator CLI, and CI-ready eval harness for ultra-fast LLM smoke-testing & regression-catching.

Language: Python - Size: 306 KB - Last synced at: 6 days ago - Pushed at: about 2 months ago - Stars: 11 - Forks: 0

pyladiesams/eval-llm-based-apps-jan2025

Create an evaluation framework for your LLM based app. Incorporate it into your test suite. Lay the monitoring foundation.

Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 7 - Forks: 5

JohnRitchie/qa-llm-guard

Language: Python - Size: 23.4 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 0

ssilwal29/api-ninja

API Ninja simplifies API testing by allowing users to define test flows in plain English.

Language: Python - Size: 1.45 MB - Last synced at: 21 days ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 0

ModelPulse/BreakYourLLM

Break Your LLM before your users do!

Language: Python - Size: 234 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

prompt-foundry/go-sdk

The prompt engineering, prompt management, and prompt evaluation tool for Go.

Size: 1000 Bytes - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

thatoldfarm/logos-infinitum-artifact

A comprehensive corpus of interconnected texts and protocols designed as a conceptual stress-test for advanced AI.

Size: 5.47 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

rimironenko/rostcamp

Language: Python - Size: 9.66 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

sandy-sp/ai-reply-index

A community-driven archive of AI prompts and responses. Log, compare, and contribute structured examples to build a searchable public prompt-response database.

Language: Python - Size: 8.03 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

dr-gareth-roberts/LLM-Dev

Python Tools for Developing with LLMs (cloud & offline)

Language: Python - Size: 286 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

Leftinant/tiny_qa_benchmark_pp

Tiny QA Benchmark++ a micro-benchmark suite (52-item gold + on-demand multilingual synthetic packs), generator CLI, and CI-ready eval harness for ultra-fast LLM smoke-testing & regression-catching.

Size: 1.95 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

wizenheimer/periscope

LLM Performance Testing | K6 + Grafana + InfluxDB | A tiny toolkit for load testing and benchmarking OpenAI-like inference endpoints using K6 + Grafana + InfluxDB

Language: JavaScript - Size: 563 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

neonxploit/Dragon-Glitch---NeonXploit-Audit-v1.0-

Red-team audit on deepseek AI by lala aka NeonXploit (operation dragon Glitch)

Size: 2.28 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

MirrorLoop/mirrorloop-core

Official public release of MirrorLoop Core (v1.3 – April 2025)

Size: 1.39 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

borisveis/LLMTesting

LLM Testing with gpt4all

Language: Python - Size: 24.4 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0