GitHub topics: ai-benchmarks

Repositories

DanielButler1/AI-Stats

The Most Comprehensive Set of AI Model Benchmark Scores, Prices & Information

Language: TypeScript - Size: 1.69 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

scicode-bench/SciCode

A benchmark that challenges language models to code solutions for scientific problems

Language: Python - Size: 9.83 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 114 - Forks: 16

Benchmark evaluating LLMs on their ability to create and resist disinformation. Includes comprehensive testing across major models (Claude, GPT-4, Gemini, Llama, etc.) with standardized evaluation metrics.

Size: 36.1 KB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 26 - Forks: 2

mrowan137/ml-performance-benchmark

Performance benchmarking for ML/AI workloads ResNet, CosmoFlow, & DeepCam

Language: CWeb - Size: 198 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 1

Paraskevi-KIvroglou/Hackathon-LlamaEval

LlamaEval is a rapid prototype developed during a hackathon to provide a user-friendly dashboard for evaluating and comparing Llama models using the TogetherAI API.

Language: Python - Size: 66.8 MB - Last synced at: 9 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 1