Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: evaluation

CarlosNZ/fig-tree-evaluator

A highly configurable custom expression tree evaluator

Language: TypeScript - Size: 17.9 MB - Last synced: about 1 hour ago - Pushed: 1 day ago - Stars: 10 - Forks: 1

HowieHwong/TrustLLM

(ICML 2024) TrustLLM: Trustworthiness in Large Language Models

Language: Python - Size: 9.52 MB - Last synced: about 5 hours ago - Pushed: about 5 hours ago - Stars: 309 - Forks: 27

langchain-ai/langsmith-sdk

LangSmith Client SDK Implementations

Language: Python - Size: 2.39 MB - Last synced: about 10 hours ago - Pushed: about 19 hours ago - Stars: 313 - Forks: 46

umk/fngraph

Primitives for composing and evaluating functions based on their inputs and outputs

Language: TypeScript - Size: 640 KB - Last synced: about 7 hours ago - Pushed: about 16 hours ago - Stars: 0 - Forks: 0

onejune2018/Awesome-LLM-Eval

Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, learderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表,主要面向大型语言模型评测(例如ChatGPT、LLaMA、GLM、Baichuan等).

Size: 13.3 MB - Last synced: about 10 hours ago - Pushed: 1 day ago - Stars: 282 - Forks: 32

CLUEbenchmark/SuperCLUE

SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese

Size: 24.3 MB - Last synced: about 17 hours ago - Pushed: about 18 hours ago - Stars: 2,640 - Forks: 88

cdaringe/programming-language-selector

Programming Language Selector based on language metadata and user-specified values.

Language: TypeScript - Size: 1.9 MB - Last synced: about 17 hours ago - Pushed: about 19 hours ago - Stars: 2 - Forks: 0

CCAFS/MARLO

Managing Agricultural Research for Learning and Outcomes

Language: Java - Size: 195 MB - Last synced: about 19 hours ago - Pushed: 1 day ago - Stars: 8 - Forks: 8

langwatch/langevals

LangEvals aggregates various language model evaluators into a single platform, providing a standard interface for a multitude of scores and LLM guardrails, for you to protect and benchmark your LLM models and pipelines.

Language: Python - Size: 1.15 MB - Last synced: about 18 hours ago - Pushed: 1 day ago - Stars: 11 - Forks: 3

athina-ai/athina-evals

Python SDK for running evaluations on LLM generated responses

Language: Python - Size: 985 KB - Last synced: about 21 hours ago - Pushed: about 21 hours ago - Stars: 135 - Forks: 11

r-lib/evaluate

A version of eval for R that returns more information about what happened

Language: R - Size: 387 KB - Last synced: about 21 hours ago - Pushed: about 22 hours ago - Stars: 107 - Forks: 33

sepandhaghighi/pycm

Multi-class confusion matrix library in Python

Language: Python - Size: 11.4 MB - Last synced: 34 minutes ago - Pushed: 12 days ago - Stars: 1,431 - Forks: 121

promptfoo/promptfoo

Test your prompts, models, and RAGs. Catch regressions and improve prompt quality. LLM evals for OpenAI, Azure, Anthropic, Gemini, Mistral, Llama, Bedrock, Ollama, and other local & private models with CI/CD integration.

Language: TypeScript - Size: 14 MB - Last synced: about 19 hours ago - Pushed: 1 day ago - Stars: 2,878 - Forks: 182

symflower/eval-dev-quality

DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.

Language: Go - Size: 1.94 MB - Last synced: about 20 hours ago - Pushed: 1 day ago - Stars: 27 - Forks: 1

paul-schuhm/git-branching-model-tp

Un sujet de projet pour pratique les git branching models

Language: HTML - Size: 63.5 KB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 0 - Forks: 3

paul-schuhm/git-projet-minitrice

Sujet de projet collaboratif à versionner

Size: 6.84 KB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 0 - Forks: 0

duohongrui/simpipe

The standard pipeline of definiting a simulation method, estimating parameters from datasets, simulating and evaluating datasets.

Language: R - Size: 404 KB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 1 - Forks: 0

alipay/ant-application-security-testing-benchmark

xAST评价体系,让安全工具不再“黑盒”. The xAST evaluation benchmark makes security tools no longer a "black box".

Language: Java - Size: 8.68 MB - Last synced: about 8 hours ago - Pushed: 1 day ago - Stars: 235 - Forks: 27

open-compass/opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Language: Python - Size: 2.87 MB - Last synced: 1 day ago - Pushed: 3 days ago - Stars: 2,659 - Forks: 278

open-compass/VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 40+ HF models, 20+ benchmarks

Language: Python - Size: 1.48 MB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 424 - Forks: 46

langchain-ai/langsmith-docs

Documentation for langsmith

Language: MDX - Size: 138 MB - Last synced: about 18 hours ago - Pushed: 1 day ago - Stars: 58 - Forks: 17

ziqihuangg/Awesome-Evaluation-of-Visual-Generation

A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems

Size: 1.89 MB - Last synced: 2 days ago - Pushed: 2 days ago - Stars: 67 - Forks: 5

uptrain-ai/uptrain

UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.

Language: Python - Size: 35.7 MB - Last synced: 1 day ago - Pushed: 2 days ago - Stars: 2,014 - Forks: 169

Sensirion/python-uart-svm4x

Python driver to work with the SVM4x evaluation kit over UART using SHDLC protocol

Language: Python - Size: 6.66 MB - Last synced: 2 days ago - Pushed: 2 days ago - Stars: 0 - Forks: 0

gagolews/deepr

Deep R Programming (Open-Access Textbook)

Size: 105 MB - Last synced: 2 days ago - Pushed: 2 days ago - Stars: 85 - Forks: 1

kanekescom/laravel-simonevbang

SIMONEVBANG for Laravel

Size: 1.95 KB - Last synced: 2 days ago - Pushed: 2 days ago - Stars: 0 - Forks: 1

taka8t/lieval

lieval is a lightweight Rust crate for parsing and evaluating mathematical expressions from strings.

Language: Rust - Size: 27.3 KB - Last synced: 2 days ago - Pushed: 2 days ago - Stars: 1 - Forks: 0

CS-EVAL/CS-Eval

CS-Eval is a comprehensive evaluation suite for fundamental cybersecurity models or large language models' cybersecurity ability.

Size: 1.01 MB - Last synced: 2 days ago - Pushed: 3 days ago - Stars: 2 - Forks: 0

Striveworks/valor

Valor is a centralized evaluation store which makes it easy to measure, explore, and rank model performance.

Language: Python - Size: 27.6 MB - Last synced: 29 days ago - Pushed: 29 days ago - Stars: 34 - Forks: 0

Auto-Playground/ragrank

🎯 Your free LLM evaluation toolkit helps you assess the accuracy of facts, how well it understands context, its tone, and more. This helps you see how good your LLM applications are.

Language: Python - Size: 569 KB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 17 - Forks: 4

microsoft/llmops-promptflow-template

LLMOps with Prompt Flow is a "LLMOps template and guidance" to help you build LLM-infused apps using Prompt Flow. It offers a range of features including Centralized Code Hosting, Lifecycle Management, Variant and Hyperparameter Experimentation, A/B Deployment, reporting for all runs and experiments and so on.

Language: Python - Size: 5.54 MB - Last synced: 2 days ago - Pushed: 3 days ago - Stars: 165 - Forks: 152

cbuschka/mockito-python-eval

Evaluation of mockito-python

Language: Python - Size: 1000 Bytes - Last synced: 3 days ago - Pushed: about 4 years ago - Stars: 1 - Forks: 0

cbuschka/k3d-eval

Language: Shell - Size: 19.5 KB - Last synced: 3 days ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

MichaelGrupp/evo

Python package for the evaluation of odometry and SLAM

Language: Python - Size: 6.57 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 3,249 - Forks: 733

BloopAI/COBOLEval

Evaluate LLM-generated COBOL

Language: Python - Size: 140 KB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 15 - Forks: 2

gereon-t/trajectopy-core

Trajectopy - Trajectory Evaluation in Python

Language: Python - Size: 34.3 MB - Last synced: 3 days ago - Pushed: 4 days ago - Stars: 1 - Forks: 1

mrgloom/awesome-semantic-segmentation

:metal: awesome-semantic-segmentation

Size: 283 KB - Last synced: 3 days ago - Pushed: about 3 years ago - Stars: 10,340 - Forks: 2,489

Cloud-CV/EvalAI

:cloud: :rocket: :bar_chart: :chart_with_upwards_trend: Evaluating state of the art in AI

Language: Python - Size: 63.1 MB - Last synced: about 23 hours ago - Pushed: 7 days ago - Stars: 1,694 - Forks: 765

ContinualAI/avalanche

Avalanche: an End-to-End Library for Continual Learning based on PyTorch.

Language: Python - Size: 14.2 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 1,680 - Forks: 283

symblai/speech-recognition-evaluation

Evaluate results from ASR/Speech-to-Text quickly

Language: JavaScript - Size: 36.1 KB - Last synced: 3 days ago - Pushed: over 2 years ago - Stars: 31 - Forks: 7

jianzfb/antgo

Machine Learning Experiment Manage Platform

Language: Python - Size: 17.8 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 186 - Forks: 7

microsoft/promptbench

A unified evaluation framework for large language models

Language: Python - Size: 5.42 MB - Last synced: 3 days ago - Pushed: 9 days ago - Stars: 2,091 - Forks: 162

google/imageinwords

Data release for the ImageInWords (IIW) paper.

Language: JavaScript - Size: 21.3 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 67 - Forks: 0

ianarawjo/ChainForge

An open-source visual programming environment for battle-testing prompts to LLMs.

Language: TypeScript - Size: 76.7 MB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 2,004 - Forks: 139

xinshuoweng/AB3DMOT

(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"

Language: Python - Size: 181 MB - Last synced: 3 days ago - Pushed: about 1 month ago - Stars: 1,631 - Forks: 397

gereon-t/trajectopy

Trajectopy - Trajectory Evaluation in Python

Language: Python - Size: 9.38 MB - Last synced: 3 days ago - Pushed: 4 days ago - Stars: 21 - Forks: 2

aimonlabs/aimon-rely

Aimon Rely is a state-of-the-art, multi-model API that provides detectors for LLM quality metrics in both offline evaluation and online monitoring scenarios.

Language: Python - Size: 1.08 MB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 6 - Forks: 2

kolenaIO/kolena

Python client for Kolena's machine learning testing platform

Language: Python - Size: 71.4 MB - Last synced: 28 days ago - Pushed: 30 days ago - Stars: 38 - Forks: 4

abo-abo/lispy

Short and sweet LISP editing

Language: Emacs Lisp - Size: 5.07 MB - Last synced: 3 days ago - Pushed: 2 months ago - Stars: 1,187 - Forks: 129

DIAGNijmegen/picai_eval

Evaluation of 3D detection and diagnosis performance —geared towards prostate cancer detection in MRI.

Language: Python - Size: 775 KB - Last synced: 5 days ago - Pushed: 6 days ago - Stars: 15 - Forks: 9

dustalov/llmfao

Large Language Model Feedback Analysis and Optimization (LLMFAO)

Language: Python - Size: 1.08 MB - Last synced: 5 days ago - Pushed: 5 days ago - Stars: 2 - Forks: 0

alopatenko/LLMEvaluation

A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessment, and critically assess the effectiveness of these evaluation methods.

Size: 2.74 MB - Last synced: 5 days ago - Pushed: 5 days ago - Stars: 22 - Forks: 0

modelscope/eval-scope

A streamlined and customizable framework for efficient large model evaluation and performance benchmarking

Language: Python - Size: 1.39 MB - Last synced: 2 days ago - Pushed: 4 days ago - Stars: 61 - Forks: 8

jannikmi/multivar_horner

python package implementing a multivariate Horner scheme for efficiently evaluating multivariate polynomials

Language: Python - Size: 4.71 MB - Last synced: 5 days ago - Pushed: 6 days ago - Stars: 26 - Forks: 3

terryyz/ice-score

[EACL 2024] ICE-Score: Instructing Large Language Models to Evaluate Code

Language: Python - Size: 21.8 MB - Last synced: 6 days ago - Pushed: 6 days ago - Stars: 62 - Forks: 7

microsoft/LMChallenge

A library & tools to evaluate predictive language models.

Language: Python - Size: 122 KB - Last synced: 4 days ago - Pushed: 9 months ago - Stars: 61 - Forks: 18

naotake51/evaluation

naotake51/evaluation is a Composer package for creating a simple expression evaluation module. It allows you to register your own functions to evaluate expressions (strings).

Language: PHP - Size: 62.5 KB - Last synced: 6 days ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

andreeaiana/newsreclib

PyTorch-Lightning Library for Neural News Recommendation

Language: Python - Size: 617 KB - Last synced: 6 days ago - Pushed: 6 days ago - Stars: 30 - Forks: 4

mjalali/renyi-kernel-entropy

[NeurIPS 2023] Code base for the Renyi Kernel Entropy (RKE) metric for generative models.

Language: Python - Size: 508 KB - Last synced: 3 days ago - Pushed: 7 days ago - Stars: 9 - Forks: 0

microsoft/rag-experiment-accelerator

The RAG Experiment Accelerator is a versatile tool designed to expedite and facilitate the process of conducting experiments and evaluations using Azure Cognitive Search and RAG pattern.

Language: Python - Size: 3.84 MB - Last synced: 9 days ago - Pushed: 9 days ago - Stars: 76 - Forks: 19

apacha/MusicObjectDetection

Accompanying source code for the journal paper "A Baseline for General Music Object Detection with Deep Learning"

Language: Python - Size: 530 KB - Last synced: 8 days ago - Pushed: 9 days ago - Stars: 10 - Forks: 8

google-deepmind/long-form-factuality

Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".

Language: Python - Size: 799 KB - Last synced: 8 days ago - Pushed: 9 days ago - Stars: 444 - Forks: 44

MMMU-Benchmark/MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

Language: Python - Size: 3.32 MB - Last synced: 8 days ago - Pushed: 8 days ago - Stars: 254 - Forks: 16

iamrk04/LLM-Solutions-Playbook

Unlock the potential of AI-driven solutions and delve into the world of Large Language Models. Explore cutting-edge concepts, real-world applications, and best practices to build powerful systems with these state-of-the-art models.

Language: Jupyter Notebook - Size: 5.36 MB - Last synced: 8 days ago - Pushed: 8 days ago - Stars: 2 - Forks: 1

lyy1994/awesome-data-contamination

The Paper List on Data Contamination for Large Language Models Evaluation.

Size: 356 KB - Last synced: 8 days ago - Pushed: 8 days ago - Stars: 22 - Forks: 1

alvarorichard/CortexC

Interpreter is a minimalist yet powerful tool designed to interpret and execute a subset of the C programming language.

Language: C - Size: 23.2 MB - Last synced: 1 day ago - Pushed: 2 months ago - Stars: 8 - Forks: 0

terryyz/llm-benchmark

A list of LLM benchmark frameworks.

Size: 14.6 KB - Last synced: 5 days ago - Pushed: 3 months ago - Stars: 43 - Forks: 3

OPTML-Group/Unlearn-WorstCase

"Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning" by Chongyu Fan*, Jiancheng Liu*, Alfred Hero, Sijia Liu

Language: Python - Size: 18.4 MB - Last synced: 8 days ago - Pushed: 8 days ago - Stars: 5 - Forks: 0

marcusm117/IdentityChain

[ICLR 2024] Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain

Language: Python - Size: 1.67 MB - Last synced: 6 days ago - Pushed: 7 days ago - Stars: 6 - Forks: 0

viebel/klipse

Klipse is a JavaScript plugin for embedding interactive code snippets in tech blogs.

Language: HTML - Size: 89.8 MB - Last synced: 6 days ago - Pushed: over 1 year ago - Stars: 3,093 - Forks: 153

opennms-forge/stack-play

🎢 Just the fun parts! - Some docker-compose container stacks for local labs or playgrounds

Language: Shell - Size: 18.6 MB - Last synced: 8 days ago - Pushed: 9 days ago - Stars: 5 - Forks: 6

JuezUN/INGInious Fork of UCL-INGI/INGInious

UNCode is an online platform for frequent practice and automatic evaluation of computer programming, Jupyter Notebooks and hardware description language (VHDL/Verilog) assignments. Also provides a pluggable interface with your existing LMS.

Language: Python - Size: 52.2 MB - Last synced: 9 days ago - Pushed: 9 days ago - Stars: 8 - Forks: 6

Awrsha/Machine-Learning-and-Deep-Learning

Some of the topics, algorithms and projects in Machine Learning & Deep Learning that I have worked on and become familiar with.

Language: Jupyter Notebook - Size: 23.9 MB - Last synced: 9 days ago - Pushed: 9 days ago - Stars: 3 - Forks: 1

lisiarend/PRONE

R Package for preprocessing, normalizing, and analyzing proteomics data

Language: R - Size: 40.4 MB - Last synced: 9 days ago - Pushed: 9 days ago - Stars: 0 - Forks: 0

singh-rajiv/EvalAutoUT-Ang

Evaluation of an automatic AI-based Unit Test generation tool for TypeScript/JavaScript

Language: TypeScript - Size: 422 KB - Last synced: 10 days ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0

AmenRa/ranx

⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍

Language: Python - Size: 34.5 MB - Last synced: 10 days ago - Pushed: 13 days ago - Stars: 348 - Forks: 21

danthedeckie/simpleeval

Simple Safe Sandboxed Extensible Expression Evaluator for Python

Language: Python - Size: 201 KB - Last synced: about 7 hours ago - Pushed: about 1 month ago - Stars: 426 - Forks: 83

EXP-Tools/steam-discount

steam 特惠游戏榜单(自动刷新)

Language: Python - Size: 10.6 GB - Last synced: 11 days ago - Pushed: 11 days ago - Stars: 53 - Forks: 26

seungjaeryanlee/Doumi-Chess

C++ UCI Chess Engine

Language: C++ - Size: 4.33 MB - Last synced: 10 days ago - Pushed: over 4 years ago - Stars: 4 - Forks: 0

RecList/reclist

Behavioral "black-box" testing for recommender systems

Language: Python - Size: 3.7 MB - Last synced: 3 days ago - Pushed: 9 months ago - Stars: 451 - Forks: 26

ianwalter/rhino-stock-example

An example of how to use Mozilla Rhino to execute JavaScript within Java

Language: Java - Size: 1.11 MB - Last synced: 11 days ago - Pushed: about 11 years ago - Stars: 1 - Forks: 0

yilunzhu/ontogum

Repository for the OntoGUM Corpus

Language: Python - Size: 8.36 MB - Last synced: 10 days ago - Pushed: 11 days ago - Stars: 6 - Forks: 0

hitz-zentroa/latxa

Latxa: An Open Language Model and Evaluation Suite for Basque

Language: Shell - Size: 27.4 MB - Last synced: 10 days ago - Pushed: 11 days ago - Stars: 16 - Forks: 0

2KAbhishek/EvalTrivia

Expression Evaluation Trivia 🟰🔢

Language: Kotlin - Size: 189 KB - Last synced: 11 days ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

NullDev/Spendenr-AI-d

AI powered Spendenraid evaluation.

Language: JavaScript - Size: 76.3 MB - Last synced: 10 days ago - Pushed: 11 days ago - Stars: 14 - Forks: 4

RichardObi/frd-score

Compute the Fréchet Radiomics Distance.

Language: Python - Size: 184 KB - Last synced: 10 days ago - Pushed: 11 days ago - Stars: 0 - Forks: 0

zzzprojects/Eval-SQL.NET

SQL Eval Function | Dynamically Evaluate Expression in SQL Server using C# Syntax.

Language: C# - Size: 861 KB - Last synced: 11 days ago - Pushed: 12 days ago - Stars: 95 - Forks: 40

lilakk/BooookScore

A package to generate summaries of long-form text and evaluate the coherence of these summaries. Official package for our ICLR 2024 paper, "BooookScore: A systematic exploration of book-length summarization in the era of LLMs".

Language: Python - Size: 27.1 MB - Last synced: 9 days ago - Pushed: about 1 month ago - Stars: 68 - Forks: 6

Valires/er-evaluation

An End-to-End Evaluation Framework for Entity Resolution Systems

Language: Python - Size: 62.4 MB - Last synced: 11 days ago - Pushed: 5 months ago - Stars: 22 - Forks: 3

mcthouacbb/Sirius

Chess engine

Language: C++ - Size: 36.2 MB - Last synced: 12 days ago - Pushed: 12 days ago - Stars: 15 - Forks: 0

gchudnov/bscript

BScript - AST Evaluation & Debugging

Language: Scala - Size: 2 MB - Last synced: 4 days ago - Pushed: 5 days ago - Stars: 3 - Forks: 0

research-outcome/LLM-TicTacToe-Benchmark

Benchmarking Large Language Model (LLM) Performance for Game Playing via Tic-Tac-Toe

Language: Python - Size: 772 KB - Last synced: 12 days ago - Pushed: 13 days ago - Stars: 0 - Forks: 0

MileBench/MileBench

This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"

Language: Python - Size: 3.51 MB - Last synced: 12 days ago - Pushed: 12 days ago - Stars: 7 - Forks: 0

GrumpyZhou/image-matching-toolbox

This is a toolbox repository to help evaluate various methods that perform image matching from a pair of images.

Language: Jupyter Notebook - Size: 20 MB - Last synced: 7 days ago - Pushed: 13 days ago - Stars: 494 - Forks: 72

zzzprojects/Eval-Expression.NET

C# Eval Expression | Evaluate, Compile, and Execute C# code and expression at runtime.

Language: C# - Size: 904 KB - Last synced: 11 days ago - Pushed: 12 days ago - Stars: 427 - Forks: 84

Baukebrenninkmeijer/table-evaluator

Evaluate real and synthetic datasets against each other

Language: Jupyter Notebook - Size: 6.23 MB - Last synced: 4 days ago - Pushed: 9 months ago - Stars: 76 - Forks: 27

sourceduty/Professional_Value

🧑‍💼 Measure the value of professional experience.

Size: 1.95 KB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 0 - Forks: 0

char-ptr/crazed

execute js, py, rust, c++, and c via a bot on discord

Language: Rust - Size: 17.6 KB - Last synced: 13 days ago - Pushed: about 2 years ago - Stars: 0 - Forks: 0

saschaschramm/sc2-evals

Evaluation of GPT-4 on StarCraft II

Language: Python - Size: 18.6 KB - Last synced: 14 days ago - Pushed: 14 days ago - Stars: 0 - Forks: 0

obss/jury

Comprehensive NLP Evaluation System

Language: Python - Size: 284 KB - Last synced: 10 days ago - Pushed: 24 days ago - Stars: 178 - Forks: 20