Topic: "evaluations"
Scale3-Labs/langtrace
Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vectorDBs and more.. Integrate using Typescript, Python. 🚀💻📊
Language: TypeScript - Size: 3.69 MB - Last synced at: 3 days ago - Pushed at: 20 days ago - Stars: 929 - Forks: 90

log10-io/log10 📦
Python client library for improving your LLM app accuracy
Language: Python - Size: 16.6 MB - Last synced at: 13 days ago - Pushed at: 3 months ago - Stars: 98 - Forks: 11

microsoft/promptpex
Test Generation for Prompts
Language: TeX - Size: 63.3 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 86 - Forks: 10

boxbeam/Crunch
The fastest java expression compiler/evaluator
Language: Java - Size: 270 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 73 - Forks: 10

evalkit/evalkit
The TypeScript LLM Evaluation Library
Language: TypeScript - Size: 544 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 71 - Forks: 1

LLM-Evaluation-s-Always-Fatiguing/leaf-playground
A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent action level.
Language: Python - Size: 868 KB - Last synced at: 19 days ago - Pushed at: 11 months ago - Stars: 24 - Forks: 0

asteroidai/asteroid-python-sdk
The Python SDK for Asteroid, the platform for make your AI agent safe and reliable
Language: Python - Size: 465 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 15 - Forks: 0

yisaienkov/evaluations
This library implements various metrics (including Kaggle Competition, Medicine) for evaluating ML, DL, AI models, and algorithms. 📐📊📈📉📏
Language: Python - Size: 69.3 KB - Last synced at: 7 days ago - Pushed at: over 2 years ago - Stars: 14 - Forks: 1

Maitreyapatel/reliability-checklist
NLP tool for wide-range model reliability evaluations
Language: Python - Size: 4.14 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 11 - Forks: 0

ComputerScienceHouse/conditional
CSH Evals, the modern way.
Language: Python - Size: 1.48 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 10 - Forks: 32

apartresearch/3cb
3cb: Catastrophic Cyber Capabilities Benchmarking of Large Language Models
Language: Python - Size: 189 KB - Last synced at: 5 months ago - Pushed at: 7 months ago - Stars: 4 - Forks: 0

argrecsys/argael
ARGAEL is an open-source Java desktop application designed to maximize the experience and efficiency of the process of annotating and evaluating arguments in large text corpora.
Language: Java - Size: 21.1 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 1

HarryBleckert/moodle-mod_evaluation
Moodle plugin for evaluations with Moodle. This is the evaluation activity plugin.
Language: PHP - Size: 7.66 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

rJefferyXie/Chess-Program-with-Minimax-Visualizer
A functional chess game implemented in python, with pygame as a supporting graphics module.
Language: Python - Size: 161 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

METR/metr-task-boilerplate
A Cookiecutter template for developing tasks according to the METR Task Standard
Language: TypeScript - Size: 173 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

mandoline-ai/mandoline-node
Official Node.js client for the Mandoline API
Language: TypeScript - Size: 31.3 KB - Last synced at: 16 days ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

mandoline-ai/mandoline-python
Official Python client for the Mandoline API
Language: Python - Size: 33.2 KB - Last synced at: 24 days ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

jonas-becker/pd-human-vs-machine-content
The official repository for the paper "Paraphrase Detection: Human vs. Machine Content".
Language: HTML - Size: 58.3 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

bhadresh-laiya/program-evaluation.com
Do a program evaluation that really counts! That will help other students and will put really make universities and colleges take students experiences to heart!
Language: PHP - Size: 4.41 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

TayyabHanif11/My-PF-Labs
The source codes are included in the 1st Semester Course of Programming Fundamentals. From basics to moderate level learnings so far, we understood basic Cpp concepts to understanding loops, functions, arrays, pointers and structures.
Language: C++ - Size: 526 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

temulenbd/text-representation-comparison-job-recommender
PROJECT NAME: A comparative evaluation of text representation techniques for content-based job recommendation system
Language: Jupyter Notebook - Size: 17.3 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 1

jtmuller5/vibe-checker
The TypeScript LLM Evaluation File
Language: TypeScript - Size: 7.47 MB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 1

ParthaPRay/llm_evaluation_metrics_localized
This repo contains code for localized LLM evaluation metrics vis a framework using Ollama and edge resource and novel derived metrics
Language: Python - Size: 103 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

AccentureMacr0s/Opinion-Mining-System
You can build a robust opinion mining and website evaluation system on AWS. The combination of data collection, preprocessing, sentiment analysis, and rating calculation ensures that you can efficiently analyze user feedback and generate meaningful insights to evaluate websites.
Language: Python - Size: 291 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

brettdidonato/BSD_Evals
LLM evaluation framework
Language: Jupyter Notebook - Size: 380 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

esleipness/fluiddataPySpark
Utilizing Apache Spark in Google Collab, Jupyter Notebook, Databricks
Language: Jupyter Notebook - Size: 150 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ZainabZaman/IELTS_PracticeAndEvaluation
IELTS listening, speaking, reading and writing modules practice and evaluation with IELTS band calculation based on speech and text analysis and evaluation.
Language: Python - Size: 6.61 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

henrique-souza/evaluation_2_OOP
Program made for the second evaluation of object-oriented programming
Language: Java - Size: 52.7 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

henrique-souza/evaluation_1_POO
Program made for the first evaluation of object-oriented programming
Language: Java - Size: 10.7 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0
