An open API service providing repository metadata for many open source software ecosystems.

Topic: "evaluations"

Scale3-Labs/langtrace

Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vectorDBs and more.. Integrate using Typescript, Python. 🚀💻📊

Language: TypeScript - Size: 3.69 MB - Last synced at: 3 days ago - Pushed at: 20 days ago - Stars: 929 - Forks: 90

log10-io/log10 📦

Python client library for improving your LLM app accuracy

Language: Python - Size: 16.6 MB - Last synced at: 13 days ago - Pushed at: 3 months ago - Stars: 98 - Forks: 11

microsoft/promptpex

Test Generation for Prompts

Language: TeX - Size: 63.3 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 86 - Forks: 10

boxbeam/Crunch

The fastest java expression compiler/evaluator

Language: Java - Size: 270 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 73 - Forks: 10

evalkit/evalkit

The TypeScript LLM Evaluation Library

Language: TypeScript - Size: 544 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 71 - Forks: 1

LLM-Evaluation-s-Always-Fatiguing/leaf-playground

A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent action level.

Language: Python - Size: 868 KB - Last synced at: 19 days ago - Pushed at: 11 months ago - Stars: 24 - Forks: 0

asteroidai/asteroid-python-sdk

The Python SDK for Asteroid, the platform for make your AI agent safe and reliable

Language: Python - Size: 465 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 15 - Forks: 0

yisaienkov/evaluations

This library implements various metrics (including Kaggle Competition, Medicine) for evaluating ML, DL, AI models, and algorithms. 📐📊📈📉📏

Language: Python - Size: 69.3 KB - Last synced at: 7 days ago - Pushed at: over 2 years ago - Stars: 14 - Forks: 1

Maitreyapatel/reliability-checklist

NLP tool for wide-range model reliability evaluations

Language: Python - Size: 4.14 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 11 - Forks: 0

ComputerScienceHouse/conditional

CSH Evals, the modern way.

Language: Python - Size: 1.48 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 10 - Forks: 32

apartresearch/3cb

3cb: Catastrophic Cyber Capabilities Benchmarking of Large Language Models

Language: Python - Size: 189 KB - Last synced at: 5 months ago - Pushed at: 7 months ago - Stars: 4 - Forks: 0

argrecsys/argael

ARGAEL is an open-source Java desktop application designed to maximize the experience and efficiency of the process of annotating and evaluating arguments in large text corpora.

Language: Java - Size: 21.1 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 1

HarryBleckert/moodle-mod_evaluation

Moodle plugin for evaluations with Moodle. This is the evaluation activity plugin.

Language: PHP - Size: 7.66 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

rJefferyXie/Chess-Program-with-Minimax-Visualizer

A functional chess game implemented in python, with pygame as a supporting graphics module.

Language: Python - Size: 161 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

METR/metr-task-boilerplate

A Cookiecutter template for developing tasks according to the METR Task Standard

Language: TypeScript - Size: 173 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

mandoline-ai/mandoline-node

Official Node.js client for the Mandoline API

Language: TypeScript - Size: 31.3 KB - Last synced at: 16 days ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

mandoline-ai/mandoline-python

Official Python client for the Mandoline API

Language: Python - Size: 33.2 KB - Last synced at: 24 days ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

jonas-becker/pd-human-vs-machine-content

The official repository for the paper "Paraphrase Detection: Human vs. Machine Content".

Language: HTML - Size: 58.3 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

bhadresh-laiya/program-evaluation.com

Do a program evaluation that really counts! That will help other students and will put really make universities and colleges take students experiences to heart!

Language: PHP - Size: 4.41 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

TayyabHanif11/My-PF-Labs

The source codes are included in the 1st Semester Course of Programming Fundamentals. From basics to moderate level learnings so far, we understood basic Cpp concepts to understanding loops, functions, arrays, pointers and structures.

Language: C++ - Size: 526 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

temulenbd/text-representation-comparison-job-recommender

PROJECT NAME: A comparative evaluation of text representation techniques for content-based job recommendation system

Language: Jupyter Notebook - Size: 17.3 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 1

jtmuller5/vibe-checker

The TypeScript LLM Evaluation File

Language: TypeScript - Size: 7.47 MB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 1

ParthaPRay/llm_evaluation_metrics_localized

This repo contains code for localized LLM evaluation metrics vis a framework using Ollama and edge resource and novel derived metrics

Language: Python - Size: 103 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

AccentureMacr0s/Opinion-Mining-System

You can build a robust opinion mining and website evaluation system on AWS. The combination of data collection, preprocessing, sentiment analysis, and rating calculation ensures that you can efficiently analyze user feedback and generate meaningful insights to evaluate websites.

Language: Python - Size: 291 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

brettdidonato/BSD_Evals

LLM evaluation framework

Language: Jupyter Notebook - Size: 380 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

esleipness/fluiddataPySpark

Utilizing Apache Spark in Google Collab, Jupyter Notebook, Databricks

Language: Jupyter Notebook - Size: 150 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ZainabZaman/IELTS_PracticeAndEvaluation

IELTS listening, speaking, reading and writing modules practice and evaluation with IELTS band calculation based on speech and text analysis and evaluation.

Language: Python - Size: 6.61 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

henrique-souza/evaluation_2_OOP

Program made for the second evaluation of object-oriented programming

Language: Java - Size: 52.7 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

henrique-souza/evaluation_1_POO

Program made for the first evaluation of object-oriented programming

Language: Java - Size: 10.7 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0