An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: ml-testing

Doleus/doleus

Build confidence in your AI with systematic slice-based testing

Language: Python - Size: 15.6 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 11 - Forks: 0

Giskard-AI/awesome-ai-safety

📚 A curated list of papers & technical articles on AI Quality & Safety

Size: 64.5 KB - Last synced at: 9 days ago - Pushed at: 7 months ago - Stars: 193 - Forks: 21

Giskard-AI/giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

Language: Python - Size: 175 MB - Last synced at: 28 days ago - Pushed at: about 1 month ago - Stars: 4,933 - Forks: 375

OlivierBinette/er-evaluation

An End-to-End Evaluation Framework for Entity Resolution Systems

Language: Python - Size: 62.4 MB - Last synced at: 8 days ago - Pushed at: almost 2 years ago - Stars: 32 - Forks: 10

oliverweissl/SMOO

A testing framework for ML systems

Language: Python - Size: 13.2 MB - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 1

Pacific-AI-Corp/langtest

Deliver safe & effective language models

Language: Python - Size: 200 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 543 - Forks: 50

whitehackr/simtom

ML-focused synthetic data platform with realistic traffic patterns, seasonal effects, and temporal drift. BNPL transaction generator with risk scoring, configurable arrival patterns (Poisson, NHPP, Burst). Live API: simtom-production.up.railway.app | Day-per-second historical replay.

Language: Python - Size: 163 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

fixouttech/fixout

Algorithmic inspection for trustworthy ML models

Language: Python - Size: 10.7 MB - Last synced at: about 2 months ago - Pushed at: 7 months ago - Stars: 4 - Forks: 0

moonwatcher-ai/moonwatcher

Evaluation & testing framework for computer vision models

Language: Python - Size: 14.4 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 16 - Forks: 2

Somitheiconic09/AI-Safety

AAAI 2025 Tutorial on Machine Learning Safety

Size: 1000 Bytes - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

ETIQ-AI/ml-testing

ML Testing for Everyone. Find issues before they become problems.

Language: Jupyter Notebook - Size: 2.27 MB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 10 - Forks: 3

maryamsoftdev/Train_Test_data_in_ML

learning python day 4

Language: Python - Size: 1.95 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

Giskard-AI/community-content 📦

✍️ Collaborate on writing technical content for the Giskard Community

Size: 5.86 KB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 5 - Forks: 2

clarahoffmann/pycon-2023-honey-i-broke-the-pytorch-model

Streamlit app for "Honey, I broke the PyTorch model" - Talk @ PyCon & PyData 2023

Language: Python - Size: 16.8 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 12 - Forks: 0

Related Keywords
ml-testing 14 machine-learning 7 mlops 7 trustworthy-ai 5 ai-safety 4 ai 4 ml-safety 4 artificial-intelligence 3 llm 3 ml 3 deep-learning 3 python 3 ml-validation 3 responsible-ai 3 computer-vision 3 ai-security 3 ai-testing 2 ml-evaluation 2 genai 2 fairness-ai 2 data-science 2 llmops 2 awesome 2 ethical-artificial-intelligence 2 fairness 2 pytorch 2 aisafety 1 unfairness-mitigation 1 data-drift 1 harms 1 group-fairness 1 fairness-indicators 1 fairness-assessment 1 fairness-algorithms 1 data-generation 1 explainable-ml 1 explainable-ai 1 event-modeling 1 fastapi 1 bias-measurement 1 streaming-api 1 synthetic-data 1 time-compression 1 algorithmic-fairness 1 ai-systems 1 streamlit-app 1 tutorials 1 tutorial-code 1 testing 1 giskard 1 content 1 training-data 1 testdata 1 sklearn 1 polyfit 1 numpy 1 ml-training 1 monitoring-tool 1 fairness-ml 1 self-driving-cars 1 open-source 1 mechanistic-interpretability 1 eye-closure 1 drowsiness-detection 1 driver-monitoring 1 dataset 1 alignment 1 nlp 1 red-team-tools 1 rag-evaluation 1 llm-security 1 llm-evaluation 1 llm-eval 1 ai-red-team 1 agent-evaluation 1 robustness 1 natural-language-processing 1 model-validation 1 model-testing 1 ethical-ai 1 awesome-list 1 ai-quality 1 ai-alignment 1 torchmetrics 1 slice 1 quality-control 1 quality-assurance 1 eu-ai-act 1 model-assessment 1 llm-testing 1 llm-test 1 llm-evaluation-toolkit 1 llm-as-evaluator 1 large-language-models 1 ethics-in-ai 1 benchmarks 1 benchmark-framework 1 statistics 1 record-linkage 1 matching 1