An open API service providing repository metadata for many open source software ecosystems.

Topic: "evaluation"

mrgloom/awesome-semantic-segmentation

:metal: awesome-semantic-segmentation

Size: 283 KB - Last synced at: 9 days ago - Pushed at: almost 4 years ago - Stars: 10,642 - Forks: 2,486

langfuse/langfuse

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

Language: TypeScript - Size: 19.4 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 10,417 - Forks: 952

explodinggradients/ragas

Supercharge Your LLM Application Evaluations 🚀

Language: Python - Size: 40.4 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 8,816 - Forks: 884

promptfoo/promptfoo

Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

Language: TypeScript - Size: 342 MB - Last synced at: about 17 hours ago - Pushed at: about 17 hours ago - Stars: 6,241 - Forks: 513

open-compass/opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Language: Python - Size: 5.97 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 5,202 - Forks: 543

Knetic/govaluate 📦

Arbitrary expression evaluation for golang

Language: Go - Size: 292 KB - Last synced at: 3 days ago - Pushed at: 27 days ago - Stars: 3,866 - Forks: 512

Marker-Inc-Korea/AutoRAG

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

Language: Python - Size: 70 MB - Last synced at: 2 days ago - Pushed at: about 2 months ago - Stars: 3,833 - Forks: 305

MichaelGrupp/evo

Python package for the evaluation of odometry and SLAM

Language: Python - Size: 7.04 MB - Last synced at: 13 days ago - Pushed at: about 1 month ago - Stars: 3,723 - Forks: 763

Helicone/helicone

🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓

Language: TypeScript - Size: 386 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 3,609 - Forks: 364

Kiln-AI/Kiln

The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.

Language: Python - Size: 14.3 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 3,391 - Forks: 235

sdiehl/write-you-a-haskell

Building a modern functional compiler from first principles. (http://dev.stephendiehl.com/fun/)

Language: Haskell - Size: 938 KB - Last synced at: 8 days ago - Pushed at: over 4 years ago - Stars: 3,375 - Forks: 256

CLUEbenchmark/SuperCLUE

SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese

Size: 24.3 MB - Last synced at: 8 days ago - Pushed at: 11 months ago - Stars: 3,145 - Forks: 104

viebel/klipse

Klipse is a JavaScript plugin for embedding interactive code snippets in tech blogs.

Language: HTML - Size: 91.5 MB - Last synced at: 13 days ago - Pushed at: 7 months ago - Stars: 3,125 - Forks: 148

zzw922cn/Automatic_Speech_Recognition

End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow

Language: Python - Size: 5.53 MB - Last synced at: 8 days ago - Pushed at: about 2 years ago - Stars: 2,843 - Forks: 534

microsoft/promptbench

A unified evaluation framework for large language models

Language: Python - Size: 5.56 MB - Last synced at: 1 day ago - Pushed at: 3 days ago - Stars: 2,598 - Forks: 191

ianarawjo/ChainForge

An open-source visual programming environment for battle-testing prompts to LLMs.

Language: TypeScript - Size: 183 MB - Last synced at: about 1 hour ago - Pushed at: about 2 hours ago - Stars: 2,573 - Forks: 206

EvolvingLMMs-Lab/lmms-eval

Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.

Language: Python - Size: 7.29 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2,355 - Forks: 256

uptrain-ai/uptrain

UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.

Language: Python - Size: 36.9 MB - Last synced at: 13 days ago - Pushed at: 8 months ago - Stars: 2,258 - Forks: 199

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Language: Python - Size: 4.5 MB - Last synced at: about 11 hours ago - Pushed at: about 12 hours ago - Stars: 2,239 - Forks: 336

huggingface/evaluate

🤗 Evaluate: A library for easily evaluating machine learning models and datasets.

Language: Python - Size: 2.01 MB - Last synced at: about 22 hours ago - Pushed at: 3 months ago - Stars: 2,185 - Forks: 272

ContinualAI/avalanche

Avalanche: an End-to-End Library for Continual Learning based on PyTorch.

Language: Python - Size: 15.5 MB - Last synced at: 9 days ago - Pushed at: about 1 month ago - Stars: 1,875 - Forks: 306

lmnr-ai/lmnr

Laminar - open-source all-in-one platform for engineering AI products. Crate data flywheel for you AI app. Traces, Evals, Datasets, Labels. YC S24.

Language: TypeScript - Size: 30.5 MB - Last synced at: about 19 hours ago - Pushed at: about 20 hours ago - Stars: 1,866 - Forks: 113

Cloud-CV/EvalAI

:cloud: :rocket: :bar_chart: :chart_with_upwards_trend: Evaluating state of the art in AI

Language: Python - Size: 63.5 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 1,839 - Forks: 860

xinshuoweng/AB3DMOT

(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"

Language: Python - Size: 181 MB - Last synced at: 14 days ago - Pushed at: about 1 year ago - Stars: 1,729 - Forks: 406

tatsu-lab/alpaca_eval

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

Language: Jupyter Notebook - Size: 302 MB - Last synced at: 9 days ago - Pushed at: 4 months ago - Stars: 1,716 - Forks: 266

MLGroupJLU/LLM-eval-survey

The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".

Size: 5.86 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 1,510 - Forks: 92

sepandhaghighi/pycm

Multi-class confusion matrix library in Python

Language: Python - Size: 12.6 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1,474 - Forks: 126

Xnhyacinth/Awesome-LLM-Long-Context-Modeling

📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥

Size: 3.51 MB - Last synced at: about 4 hours ago - Pushed at: 7 days ago - Stars: 1,438 - Forks: 47

huggingface/lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

Language: Python - Size: 4.58 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,429 - Forks: 220

Maluuba/nlg-eval

Evaluation code for various unsupervised automated metrics for Natural Language Generation.

Language: Python - Size: 92.2 MB - Last synced at: 10 days ago - Pushed at: 8 months ago - Stars: 1,374 - Forks: 224

lunary-ai/lunary

The production toolkit for LLMs. Observability, prompt management and evaluations.

Language: TypeScript - Size: 5.64 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1,284 - Forks: 150

langwatch/langwatch

The ultimate LLM Ops platform - Monitoring, Analytics, Evaluations, Datasets and Prompt Optimization ✨

Language: TypeScript - Size: 20 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,280 - Forks: 74

abo-abo/lispy

Short and sweet LISP editing

Language: Emacs Lisp - Size: 5.07 MB - Last synced at: 14 days ago - Pushed at: 10 months ago - Stars: 1,240 - Forks: 136

EthicalML/xai

XAI - An eXplainability toolbox for machine learning

Language: Python - Size: 17.8 MB - Last synced at: 14 days ago - Pushed at: over 3 years ago - Stars: 1,162 - Forks: 180

google/fuzzbench

FuzzBench - Fuzzer benchmarking as a service.

Language: Python - Size: 37.1 MB - Last synced at: 1 day ago - Pushed at: 2 months ago - Stars: 1,142 - Forks: 281

huggingface/evaluation-guidebook

Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!

Language: Jupyter Notebook - Size: 1.01 MB - Last synced at: about 22 hours ago - Pushed at: 3 months ago - Stars: 1,140 - Forks: 72

toshas/torch-fidelity

High-fidelity performance metrics for generative models in PyTorch

Language: Python - Size: 2.24 MB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 1,063 - Forks: 74

plurai-ai/intellagent

A framework for comprehensive diagnosis and optimization of agents using simulated, realistic synthetic interactions

Language: Python - Size: 14.3 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 1,006 - Forks: 129

prometheus-eval/prometheus-eval

Evaluate your LLM's response with Prometheus and GPT4 💯

Language: Python - Size: 15 MB - Last synced at: 16 days ago - Pushed at: about 1 month ago - Stars: 898 - Forks: 55

PRBonn/semantic-kitti-api

SemanticKITTI API for visualizing dataset, processing data, and evaluating results.

Language: Python - Size: 80.1 KB - Last synced at: 8 days ago - Pushed at: 18 days ago - Stars: 828 - Forks: 188

modelscope/evalscope

A streamlined and customizable framework for efficient large model evaluation and performance benchmarking

Language: Python - Size: 49.1 MB - Last synced at: about 13 hours ago - Pushed at: about 14 hours ago - Stars: 815 - Forks: 89

ncalc/ncalc

NCalc is a fast and lightweight expression evaluator library for .NET, designed for flexibility and high performance. It supports a wide range of mathematical and logical operations.

Language: C# - Size: 1.15 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 796 - Forks: 97

CBLUEbenchmark/CBLUE

中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark

Language: Python - Size: 1.61 MB - Last synced at: 20 days ago - Pushed at: almost 2 years ago - Stars: 769 - Forks: 132

IntelLabs/RAG-FiT

Framework for enhancing LLMs for RAG tasks using fine-tuning.

Language: Python - Size: 925 KB - Last synced at: 7 days ago - Pushed at: 2 months ago - Stars: 737 - Forks: 56

PaesslerAG/gval

Expression evaluation in golang

Language: Go - Size: 797 KB - Last synced at: 7 months ago - Pushed at: 11 months ago - Stars: 731 - Forks: 82

dbolya/tide

A General Toolbox for Identifying Object Detection Errors

Language: Python - Size: 12 MB - Last synced at: 13 days ago - Pushed at: about 2 years ago - Stars: 715 - Forks: 115

bochinski/iou-tracker

Python implementation of the IOU Tracker

Language: Python - Size: 40 KB - Last synced at: 11 months ago - Pushed at: about 5 years ago - Stars: 688 - Forks: 176

codingseb/ExpressionEvaluator

A Simple Math and Pseudo C# Expression Evaluator in One C# File. Can also execute small C# like scripts

Language: C# - Size: 964 KB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 604 - Forks: 100

ucinlp/autoprompt

AutoPrompt: Automatic Prompt Construction for Masked Language Models.

Language: Python - Size: 76.2 MB - Last synced at: 5 months ago - Pushed at: 8 months ago - Stars: 595 - Forks: 81

google-deepmind/long-form-factuality

Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".

Language: Python - Size: 759 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 589 - Forks: 71

tecnickcom/tcexam

TCExam is a CBA (Computer-Based Assessment) system (e-exam, CBT - Computer Based Testing) for universities, schools and companies, that enables educators and trainers to author, schedule, deliver, and report on surveys, quizzes, tests and exams.

Language: PHP - Size: 69.2 MB - Last synced at: 7 days ago - Pushed at: 22 days ago - Stars: 582 - Forks: 408

jkkummerfeld/text2sql-data

A collection of datasets that pair questions with SQL queries.

Language: Python - Size: 30.3 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 559 - Forks: 110

HowieHwong/TrustLLM

[ICML 2024] TrustLLM: Trustworthiness in Large Language Models

Language: Python - Size: 10.4 MB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 548 - Forks: 51

AmenRa/ranx

⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍

Language: Python - Size: 34.6 MB - Last synced at: 2 days ago - Pushed at: 10 months ago - Stars: 542 - Forks: 28

langchain-ai/langsmith-sdk

LangSmith Client SDK Implementations

Language: Python - Size: 10.8 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 524 - Forks: 111

onejune2018/Awesome-LLM-Eval

Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表,主要面向基础大模型评测,旨在探求生成式AI的技术边界.

Size: 12.6 MB - Last synced at: 2 days ago - Pushed at: 6 months ago - Stars: 514 - Forks: 44

zenogantner/MyMediaLite

recommender system library for the CLR (.NET)

Language: C# - Size: 29.2 MB - Last synced at: 6 days ago - Pushed at: almost 5 years ago - Stars: 506 - Forks: 190

caserec/CaseRecommender

Case Recommender: A Flexible and Extensible Python Framework for Recommender Systems

Language: Python - Size: 1.35 MB - Last synced at: 15 days ago - Pushed at: over 1 year ago - Stars: 495 - Forks: 92

GrumpyZhou/image-matching-toolbox

This is a toolbox repository to help evaluate various methods that perform image matching from a pair of images.

Language: Jupyter Notebook - Size: 20 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 494 - Forks: 72

X-PLUG/CValues

面向中文大模型价值观的评估与对齐研究

Language: Python - Size: 4.2 MB - Last synced at: 5 months ago - Pushed at: almost 2 years ago - Stars: 476 - Forks: 20

zzzprojects/Eval-Expression.NET

C# Eval Expression | Evaluate, Compile, and Execute C# code and expression at runtime.

Language: C# - Size: 903 KB - Last synced at: 3 days ago - Pushed at: 13 days ago - Stars: 472 - Forks: 86

danthedeckie/simpleeval

Simple Safe Sandboxed Extensible Expression Evaluator for Python

Language: Python - Size: 286 KB - Last synced at: 23 days ago - Pushed at: 4 months ago - Stars: 470 - Forks: 87

RecList/reclist

Behavioral "black-box" testing for recommender systems

Language: Python - Size: 3.7 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 466 - Forks: 25

ModelTC/llmc

[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".

Language: Python - Size: 28.9 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 456 - Forks: 53

THU-KEG/EvaluationPapers4ChatGPT

Resource, Evaluation and Detection Papers for ChatGPT

Size: 259 KB - Last synced at: about 5 hours ago - Pushed at: about 1 year ago - Stars: 455 - Forks: 25

chrisjbryant/errant

ERRor ANnotation Toolkit: Automatically extract and classify grammatical errors in parallel original and corrected sentences.

Language: Python - Size: 680 KB - Last synced at: 17 days ago - Pushed at: about 1 year ago - Stars: 443 - Forks: 108

MMMU-Benchmark/MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

Language: Python - Size: 186 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 413 - Forks: 34

davidstutz/superpixel-benchmark

An extensive evaluation and comparison of 28 state-of-the-art superpixel algorithms on 5 datasets.

Language: C++ - Size: 27.5 MB - Last synced at: 16 days ago - Pushed at: over 1 year ago - Stars: 406 - Forks: 109

alipay/ant-application-security-testing-benchmark

xAST评价体系,让安全工具不再“黑盒”. The xAST evaluation benchmark makes security tools no longer a "black box".

Language: Java - Size: 10.6 MB - Last synced at: 13 days ago - Pushed at: 21 days ago - Stars: 381 - Forks: 49

votchallenge/toolkit-legacy 📦

Visual Object Tracking (VOT) challenge evaluation toolkit

Language: MATLAB - Size: 1.27 MB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 380 - Forks: 170

StrangerZhang/pysot-toolkit

Python Single Object Tracking Evaluation

Language: Python - Size: 186 KB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 377 - Forks: 69

audiolabs/webMUSHRA

a MUSHRA compliant web audio API based experiment software

Language: JavaScript - Size: 6.96 MB - Last synced at: 10 days ago - Pushed at: 24 days ago - Stars: 371 - Forks: 146

shmsw25/FActScore

A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"

Language: Python - Size: 102 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 337 - Forks: 50

sb-ai-lab/RePlay

A Comprehensive Framework for Building End-to-End Recommendation Systems with State-of-the-Art Models

Language: Python - Size: 35.2 MB - Last synced at: 7 days ago - Pushed at: 21 days ago - Stars: 331 - Forks: 33

jianzfb/antgo

Machine Learning Experiment Manage Platform

Language: Python - Size: 20.2 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 318 - Forks: 7

microsoft/genaiops-promptflow-template

GenAIOps with Prompt Flow is a "GenAIOps template and guidance" to help you build LLM-infused apps using Prompt Flow. It offers a range of features including Centralized Code Hosting, Lifecycle Management, Variant and Hyperparameter Experimentation, A/B Deployment, reporting for all runs and experiments and so on.

Language: Python - Size: 6.78 MB - Last synced at: 7 days ago - Pushed at: 11 days ago - Stars: 314 - Forks: 261

hbaniecki/adversarial-explainable-ai

💡 Adversarial attacks on explanations and how to defend them

Size: 2.62 MB - Last synced at: 27 days ago - Pushed at: 5 months ago - Stars: 314 - Forks: 48

cvangysel/pytrec_eval

pytrec_eval is an Information Retrieval evaluation tool for Python, based on the popular trec_eval.

Language: C++ - Size: 43 KB - Last synced at: 10 days ago - Pushed at: over 1 year ago - Stars: 302 - Forks: 33

rentruewang/bocoel

Bayesian Optimization as a Coverage Tool for Evaluating LLMs. Accurate evaluation (benchmarking) that's 10 times faster with just a few lines of modular code.

Language: Python - Size: 8.31 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 281 - Forks: 17

AstraZeneca/rexmex

A general purpose recommender metrics library for fair evaluation.

Language: Python - Size: 2.68 MB - Last synced at: 10 days ago - Pushed at: over 1 year ago - Stars: 280 - Forks: 25

athina-ai/athina-evals

Python SDK for running evaluations on LLM generated responses

Language: Python - Size: 1.84 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 276 - Forks: 17

FuxiaoLiu/LRV-Instruction

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

Language: Python - Size: 23.9 MB - Last synced at: 2 days ago - Pushed at: about 1 year ago - Stars: 276 - Forks: 13

ziqihuangg/Awesome-Evaluation-of-Visual-Generation

A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems

Size: 2.53 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 274 - Forks: 16

belambert/asr-evaluation

Python module for evaluating ASR hypotheses (e.g. word error rate, word recognition rate).

Language: Python - Size: 124 KB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 272 - Forks: 78

ScalingIntelligence/KernelBench

KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems

Language: Python - Size: 1.73 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 260 - Forks: 23

SAILResearch/awesome-foundation-model-leaderboards

A curated list of awesome leaderboard-oriented resources for foundation models

Size: 938 KB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 259 - Forks: 35

Wscats/compile-hero

🔰Visual Studio Code Extension For Compiling Language

Language: TypeScript - Size: 35.7 MB - Last synced at: 6 months ago - Pushed at: about 1 year ago - Stars: 257 - Forks: 59

clovaai/generative-evaluation-prdc

Code base for the precision, recall, density, and coverage metrics for generative models. ICML 2020.

Language: Python - Size: 290 KB - Last synced at: 12 days ago - Pushed at: over 2 years ago - Stars: 254 - Forks: 28

evfro/polara

Recommender system and evaluation framework for top-n recommendations tasks that respects polarity of feedbacks. Fast, flexible and easy to use. Written in python, boosted by scientific python stack.

Language: Python - Size: 2 MB - Last synced at: 17 days ago - Pushed at: 20 days ago - Stars: 251 - Forks: 22

appinho/SARosPerceptionKitti

ROS package for the Perception (Sensor Processing, Detection, Tracking and Evaluation) of the KITTI Vision Benchmark Suite

Language: Python - Size: 30.2 MB - Last synced at: 5 months ago - Pushed at: about 3 years ago - Stars: 246 - Forks: 81

microsoft/rag-experiment-accelerator

The RAG Experiment Accelerator is a versatile tool designed to expedite and facilitate the process of conducting experiments and evaluations using Azure Cognitive Search and RAG pattern.

Language: Python - Size: 4.36 MB - Last synced at: 1 day ago - Pushed at: 13 days ago - Stars: 243 - Forks: 88

JinjieNi/MixEval

The official evaluation suite and dynamic data release for MixEval.

Language: Python - Size: 9.37 MB - Last synced at: 3 days ago - Pushed at: 5 months ago - Stars: 235 - Forks: 41

devmount/GermanWordEmbeddings

Toolkit to obtain and preprocess German text corpora, train models and evaluate them with generated testsets. Built with Gensim and Tensorflow.

Language: Jupyter Notebook - Size: 911 KB - Last synced at: 6 months ago - Pushed at: 8 months ago - Stars: 234 - Forks: 50

radarlabs/api-diff

A command line tool for diffing json rest APIs

Language: TypeScript - Size: 1.27 MB - Last synced at: 13 days ago - Pushed at: almost 3 years ago - Stars: 230 - Forks: 16

lgalke/vec4ir

Word Embeddings for Information Retrieval

Language: Python - Size: 965 KB - Last synced at: 10 days ago - Pushed at: over 1 year ago - Stars: 225 - Forks: 42

INGInious/INGInious

INGInious is a secure and automated exercises assessment platform using your own tests, also providing a pluggable interface with your existing LMS.

Language: Python - Size: 42.6 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 219 - Forks: 140

hpatches/hpatches-benchmark

Python & Matlab code for local feature descriptor evaluation with the HPatches dataset.

Language: MATLAB - Size: 234 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 218 - Forks: 64

zeno-ml/zeno 📦

AI Data Management & Evaluation Platform

Language: Svelte - Size: 51.6 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 215 - Forks: 11

kavgan/ROUGE-2.0

ROUGE automatic summarization evaluation toolkit. Support for ROUGE-[N, L, S, SU], stemming and stopwords in different languages, unicode text evaluation, CSV output.

Language: Java - Size: 154 MB - Last synced at: 21 days ago - Pushed at: about 5 years ago - Stars: 213 - Forks: 37

google/imageinwords

Data release for the ImageInWords (IIW) paper.

Language: JavaScript - Size: 21.4 MB - Last synced at: 1 day ago - Pushed at: 5 months ago - Stars: 209 - Forks: 9