Topic: "evaluation"
mrgloom/awesome-semantic-segmentation
:metal: awesome-semantic-segmentation
Size: 283 KB - Last synced at: 9 days ago - Pushed at: almost 4 years ago - Stars: 10,642 - Forks: 2,486

langfuse/langfuse
🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
Language: TypeScript - Size: 19.4 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 10,417 - Forks: 952

explodinggradients/ragas
Supercharge Your LLM Application Evaluations 🚀
Language: Python - Size: 40.4 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 8,816 - Forks: 884

promptfoo/promptfoo
Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.
Language: TypeScript - Size: 342 MB - Last synced at: about 17 hours ago - Pushed at: about 17 hours ago - Stars: 6,241 - Forks: 513

open-compass/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
Language: Python - Size: 5.97 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 5,202 - Forks: 543

Knetic/govaluate 📦
Arbitrary expression evaluation for golang
Language: Go - Size: 292 KB - Last synced at: 3 days ago - Pushed at: 27 days ago - Stars: 3,866 - Forks: 512

Marker-Inc-Korea/AutoRAG
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
Language: Python - Size: 70 MB - Last synced at: 2 days ago - Pushed at: about 2 months ago - Stars: 3,833 - Forks: 305

MichaelGrupp/evo
Python package for the evaluation of odometry and SLAM
Language: Python - Size: 7.04 MB - Last synced at: 13 days ago - Pushed at: about 1 month ago - Stars: 3,723 - Forks: 763

Helicone/helicone
🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓
Language: TypeScript - Size: 386 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 3,609 - Forks: 364

Kiln-AI/Kiln
The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.
Language: Python - Size: 14.3 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 3,391 - Forks: 235

sdiehl/write-you-a-haskell
Building a modern functional compiler from first principles. (http://dev.stephendiehl.com/fun/)
Language: Haskell - Size: 938 KB - Last synced at: 8 days ago - Pushed at: over 4 years ago - Stars: 3,375 - Forks: 256

CLUEbenchmark/SuperCLUE
SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese
Size: 24.3 MB - Last synced at: 8 days ago - Pushed at: 11 months ago - Stars: 3,145 - Forks: 104

viebel/klipse
Klipse is a JavaScript plugin for embedding interactive code snippets in tech blogs.
Language: HTML - Size: 91.5 MB - Last synced at: 13 days ago - Pushed at: 7 months ago - Stars: 3,125 - Forks: 148

zzw922cn/Automatic_Speech_Recognition
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Language: Python - Size: 5.53 MB - Last synced at: 8 days ago - Pushed at: about 2 years ago - Stars: 2,843 - Forks: 534

microsoft/promptbench
A unified evaluation framework for large language models
Language: Python - Size: 5.56 MB - Last synced at: 1 day ago - Pushed at: 3 days ago - Stars: 2,598 - Forks: 191

ianarawjo/ChainForge
An open-source visual programming environment for battle-testing prompts to LLMs.
Language: TypeScript - Size: 183 MB - Last synced at: about 1 hour ago - Pushed at: about 2 hours ago - Stars: 2,573 - Forks: 206

EvolvingLMMs-Lab/lmms-eval
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
Language: Python - Size: 7.29 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2,355 - Forks: 256

uptrain-ai/uptrain
UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.
Language: Python - Size: 36.9 MB - Last synced at: 13 days ago - Pushed at: 8 months ago - Stars: 2,258 - Forks: 199

open-compass/VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
Language: Python - Size: 4.5 MB - Last synced at: about 11 hours ago - Pushed at: about 12 hours ago - Stars: 2,239 - Forks: 336

huggingface/evaluate
🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
Language: Python - Size: 2.01 MB - Last synced at: about 22 hours ago - Pushed at: 3 months ago - Stars: 2,185 - Forks: 272

ContinualAI/avalanche
Avalanche: an End-to-End Library for Continual Learning based on PyTorch.
Language: Python - Size: 15.5 MB - Last synced at: 9 days ago - Pushed at: about 1 month ago - Stars: 1,875 - Forks: 306

lmnr-ai/lmnr
Laminar - open-source all-in-one platform for engineering AI products. Crate data flywheel for you AI app. Traces, Evals, Datasets, Labels. YC S24.
Language: TypeScript - Size: 30.5 MB - Last synced at: about 19 hours ago - Pushed at: about 20 hours ago - Stars: 1,866 - Forks: 113

Cloud-CV/EvalAI
:cloud: :rocket: :bar_chart: :chart_with_upwards_trend: Evaluating state of the art in AI
Language: Python - Size: 63.5 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 1,839 - Forks: 860

xinshuoweng/AB3DMOT
(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"
Language: Python - Size: 181 MB - Last synced at: 14 days ago - Pushed at: about 1 year ago - Stars: 1,729 - Forks: 406

tatsu-lab/alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
Language: Jupyter Notebook - Size: 302 MB - Last synced at: 9 days ago - Pushed at: 4 months ago - Stars: 1,716 - Forks: 266

MLGroupJLU/LLM-eval-survey
The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".
Size: 5.86 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 1,510 - Forks: 92

sepandhaghighi/pycm
Multi-class confusion matrix library in Python
Language: Python - Size: 12.6 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1,474 - Forks: 126

Xnhyacinth/Awesome-LLM-Long-Context-Modeling
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
Size: 3.51 MB - Last synced at: about 4 hours ago - Pushed at: 7 days ago - Stars: 1,438 - Forks: 47

huggingface/lighteval
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
Language: Python - Size: 4.58 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,429 - Forks: 220

Maluuba/nlg-eval
Evaluation code for various unsupervised automated metrics for Natural Language Generation.
Language: Python - Size: 92.2 MB - Last synced at: 10 days ago - Pushed at: 8 months ago - Stars: 1,374 - Forks: 224

lunary-ai/lunary
The production toolkit for LLMs. Observability, prompt management and evaluations.
Language: TypeScript - Size: 5.64 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1,284 - Forks: 150

langwatch/langwatch
The ultimate LLM Ops platform - Monitoring, Analytics, Evaluations, Datasets and Prompt Optimization ✨
Language: TypeScript - Size: 20 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,280 - Forks: 74

abo-abo/lispy
Short and sweet LISP editing
Language: Emacs Lisp - Size: 5.07 MB - Last synced at: 14 days ago - Pushed at: 10 months ago - Stars: 1,240 - Forks: 136

EthicalML/xai
XAI - An eXplainability toolbox for machine learning
Language: Python - Size: 17.8 MB - Last synced at: 14 days ago - Pushed at: over 3 years ago - Stars: 1,162 - Forks: 180

google/fuzzbench
FuzzBench - Fuzzer benchmarking as a service.
Language: Python - Size: 37.1 MB - Last synced at: 1 day ago - Pushed at: 2 months ago - Stars: 1,142 - Forks: 281

huggingface/evaluation-guidebook
Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!
Language: Jupyter Notebook - Size: 1.01 MB - Last synced at: about 22 hours ago - Pushed at: 3 months ago - Stars: 1,140 - Forks: 72

toshas/torch-fidelity
High-fidelity performance metrics for generative models in PyTorch
Language: Python - Size: 2.24 MB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 1,063 - Forks: 74

plurai-ai/intellagent
A framework for comprehensive diagnosis and optimization of agents using simulated, realistic synthetic interactions
Language: Python - Size: 14.3 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 1,006 - Forks: 129

prometheus-eval/prometheus-eval
Evaluate your LLM's response with Prometheus and GPT4 💯
Language: Python - Size: 15 MB - Last synced at: 16 days ago - Pushed at: about 1 month ago - Stars: 898 - Forks: 55

PRBonn/semantic-kitti-api
SemanticKITTI API for visualizing dataset, processing data, and evaluating results.
Language: Python - Size: 80.1 KB - Last synced at: 8 days ago - Pushed at: 18 days ago - Stars: 828 - Forks: 188

modelscope/evalscope
A streamlined and customizable framework for efficient large model evaluation and performance benchmarking
Language: Python - Size: 49.1 MB - Last synced at: about 13 hours ago - Pushed at: about 14 hours ago - Stars: 815 - Forks: 89

ncalc/ncalc
NCalc is a fast and lightweight expression evaluator library for .NET, designed for flexibility and high performance. It supports a wide range of mathematical and logical operations.
Language: C# - Size: 1.15 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 796 - Forks: 97

CBLUEbenchmark/CBLUE
中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Language: Python - Size: 1.61 MB - Last synced at: 20 days ago - Pushed at: almost 2 years ago - Stars: 769 - Forks: 132

IntelLabs/RAG-FiT
Framework for enhancing LLMs for RAG tasks using fine-tuning.
Language: Python - Size: 925 KB - Last synced at: 7 days ago - Pushed at: 2 months ago - Stars: 737 - Forks: 56

PaesslerAG/gval
Expression evaluation in golang
Language: Go - Size: 797 KB - Last synced at: 7 months ago - Pushed at: 11 months ago - Stars: 731 - Forks: 82

dbolya/tide
A General Toolbox for Identifying Object Detection Errors
Language: Python - Size: 12 MB - Last synced at: 13 days ago - Pushed at: about 2 years ago - Stars: 715 - Forks: 115

bochinski/iou-tracker
Python implementation of the IOU Tracker
Language: Python - Size: 40 KB - Last synced at: 11 months ago - Pushed at: about 5 years ago - Stars: 688 - Forks: 176

codingseb/ExpressionEvaluator
A Simple Math and Pseudo C# Expression Evaluator in One C# File. Can also execute small C# like scripts
Language: C# - Size: 964 KB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 604 - Forks: 100

ucinlp/autoprompt
AutoPrompt: Automatic Prompt Construction for Masked Language Models.
Language: Python - Size: 76.2 MB - Last synced at: 5 months ago - Pushed at: 8 months ago - Stars: 595 - Forks: 81

google-deepmind/long-form-factuality
Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".
Language: Python - Size: 759 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 589 - Forks: 71

tecnickcom/tcexam
TCExam is a CBA (Computer-Based Assessment) system (e-exam, CBT - Computer Based Testing) for universities, schools and companies, that enables educators and trainers to author, schedule, deliver, and report on surveys, quizzes, tests and exams.
Language: PHP - Size: 69.2 MB - Last synced at: 7 days ago - Pushed at: 22 days ago - Stars: 582 - Forks: 408

jkkummerfeld/text2sql-data
A collection of datasets that pair questions with SQL queries.
Language: Python - Size: 30.3 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 559 - Forks: 110

HowieHwong/TrustLLM
[ICML 2024] TrustLLM: Trustworthiness in Large Language Models
Language: Python - Size: 10.4 MB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 548 - Forks: 51

AmenRa/ranx
⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍
Language: Python - Size: 34.6 MB - Last synced at: 2 days ago - Pushed at: 10 months ago - Stars: 542 - Forks: 28

langchain-ai/langsmith-sdk
LangSmith Client SDK Implementations
Language: Python - Size: 10.8 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 524 - Forks: 111

onejune2018/Awesome-LLM-Eval
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表,主要面向基础大模型评测,旨在探求生成式AI的技术边界.
Size: 12.6 MB - Last synced at: 2 days ago - Pushed at: 6 months ago - Stars: 514 - Forks: 44

zenogantner/MyMediaLite
recommender system library for the CLR (.NET)
Language: C# - Size: 29.2 MB - Last synced at: 6 days ago - Pushed at: almost 5 years ago - Stars: 506 - Forks: 190

caserec/CaseRecommender
Case Recommender: A Flexible and Extensible Python Framework for Recommender Systems
Language: Python - Size: 1.35 MB - Last synced at: 15 days ago - Pushed at: over 1 year ago - Stars: 495 - Forks: 92

GrumpyZhou/image-matching-toolbox
This is a toolbox repository to help evaluate various methods that perform image matching from a pair of images.
Language: Jupyter Notebook - Size: 20 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 494 - Forks: 72

X-PLUG/CValues
面向中文大模型价值观的评估与对齐研究
Language: Python - Size: 4.2 MB - Last synced at: 5 months ago - Pushed at: almost 2 years ago - Stars: 476 - Forks: 20

zzzprojects/Eval-Expression.NET
C# Eval Expression | Evaluate, Compile, and Execute C# code and expression at runtime.
Language: C# - Size: 903 KB - Last synced at: 3 days ago - Pushed at: 13 days ago - Stars: 472 - Forks: 86

danthedeckie/simpleeval
Simple Safe Sandboxed Extensible Expression Evaluator for Python
Language: Python - Size: 286 KB - Last synced at: 23 days ago - Pushed at: 4 months ago - Stars: 470 - Forks: 87

RecList/reclist
Behavioral "black-box" testing for recommender systems
Language: Python - Size: 3.7 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 466 - Forks: 25

ModelTC/llmc
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
Language: Python - Size: 28.9 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 456 - Forks: 53

THU-KEG/EvaluationPapers4ChatGPT
Resource, Evaluation and Detection Papers for ChatGPT
Size: 259 KB - Last synced at: about 5 hours ago - Pushed at: about 1 year ago - Stars: 455 - Forks: 25

chrisjbryant/errant
ERRor ANnotation Toolkit: Automatically extract and classify grammatical errors in parallel original and corrected sentences.
Language: Python - Size: 680 KB - Last synced at: 17 days ago - Pushed at: about 1 year ago - Stars: 443 - Forks: 108

MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Language: Python - Size: 186 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 413 - Forks: 34

davidstutz/superpixel-benchmark
An extensive evaluation and comparison of 28 state-of-the-art superpixel algorithms on 5 datasets.
Language: C++ - Size: 27.5 MB - Last synced at: 16 days ago - Pushed at: over 1 year ago - Stars: 406 - Forks: 109

alipay/ant-application-security-testing-benchmark
xAST评价体系,让安全工具不再“黑盒”. The xAST evaluation benchmark makes security tools no longer a "black box".
Language: Java - Size: 10.6 MB - Last synced at: 13 days ago - Pushed at: 21 days ago - Stars: 381 - Forks: 49

votchallenge/toolkit-legacy 📦
Visual Object Tracking (VOT) challenge evaluation toolkit
Language: MATLAB - Size: 1.27 MB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 380 - Forks: 170

StrangerZhang/pysot-toolkit
Python Single Object Tracking Evaluation
Language: Python - Size: 186 KB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 377 - Forks: 69

audiolabs/webMUSHRA
a MUSHRA compliant web audio API based experiment software
Language: JavaScript - Size: 6.96 MB - Last synced at: 10 days ago - Pushed at: 24 days ago - Stars: 371 - Forks: 146

shmsw25/FActScore
A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"
Language: Python - Size: 102 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 337 - Forks: 50

sb-ai-lab/RePlay
A Comprehensive Framework for Building End-to-End Recommendation Systems with State-of-the-Art Models
Language: Python - Size: 35.2 MB - Last synced at: 7 days ago - Pushed at: 21 days ago - Stars: 331 - Forks: 33

jianzfb/antgo
Machine Learning Experiment Manage Platform
Language: Python - Size: 20.2 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 318 - Forks: 7

microsoft/genaiops-promptflow-template
GenAIOps with Prompt Flow is a "GenAIOps template and guidance" to help you build LLM-infused apps using Prompt Flow. It offers a range of features including Centralized Code Hosting, Lifecycle Management, Variant and Hyperparameter Experimentation, A/B Deployment, reporting for all runs and experiments and so on.
Language: Python - Size: 6.78 MB - Last synced at: 7 days ago - Pushed at: 11 days ago - Stars: 314 - Forks: 261

hbaniecki/adversarial-explainable-ai
💡 Adversarial attacks on explanations and how to defend them
Size: 2.62 MB - Last synced at: 27 days ago - Pushed at: 5 months ago - Stars: 314 - Forks: 48

cvangysel/pytrec_eval
pytrec_eval is an Information Retrieval evaluation tool for Python, based on the popular trec_eval.
Language: C++ - Size: 43 KB - Last synced at: 10 days ago - Pushed at: over 1 year ago - Stars: 302 - Forks: 33

rentruewang/bocoel
Bayesian Optimization as a Coverage Tool for Evaluating LLMs. Accurate evaluation (benchmarking) that's 10 times faster with just a few lines of modular code.
Language: Python - Size: 8.31 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 281 - Forks: 17

AstraZeneca/rexmex
A general purpose recommender metrics library for fair evaluation.
Language: Python - Size: 2.68 MB - Last synced at: 10 days ago - Pushed at: over 1 year ago - Stars: 280 - Forks: 25

athina-ai/athina-evals
Python SDK for running evaluations on LLM generated responses
Language: Python - Size: 1.84 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 276 - Forks: 17

FuxiaoLiu/LRV-Instruction
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Language: Python - Size: 23.9 MB - Last synced at: 2 days ago - Pushed at: about 1 year ago - Stars: 276 - Forks: 13

ziqihuangg/Awesome-Evaluation-of-Visual-Generation
A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems
Size: 2.53 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 274 - Forks: 16

belambert/asr-evaluation
Python module for evaluating ASR hypotheses (e.g. word error rate, word recognition rate).
Language: Python - Size: 124 KB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 272 - Forks: 78

ScalingIntelligence/KernelBench
KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems
Language: Python - Size: 1.73 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 260 - Forks: 23

SAILResearch/awesome-foundation-model-leaderboards
A curated list of awesome leaderboard-oriented resources for foundation models
Size: 938 KB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 259 - Forks: 35

Wscats/compile-hero
🔰Visual Studio Code Extension For Compiling Language
Language: TypeScript - Size: 35.7 MB - Last synced at: 6 months ago - Pushed at: about 1 year ago - Stars: 257 - Forks: 59

clovaai/generative-evaluation-prdc
Code base for the precision, recall, density, and coverage metrics for generative models. ICML 2020.
Language: Python - Size: 290 KB - Last synced at: 12 days ago - Pushed at: over 2 years ago - Stars: 254 - Forks: 28

evfro/polara
Recommender system and evaluation framework for top-n recommendations tasks that respects polarity of feedbacks. Fast, flexible and easy to use. Written in python, boosted by scientific python stack.
Language: Python - Size: 2 MB - Last synced at: 17 days ago - Pushed at: 20 days ago - Stars: 251 - Forks: 22

appinho/SARosPerceptionKitti
ROS package for the Perception (Sensor Processing, Detection, Tracking and Evaluation) of the KITTI Vision Benchmark Suite
Language: Python - Size: 30.2 MB - Last synced at: 5 months ago - Pushed at: about 3 years ago - Stars: 246 - Forks: 81

microsoft/rag-experiment-accelerator
The RAG Experiment Accelerator is a versatile tool designed to expedite and facilitate the process of conducting experiments and evaluations using Azure Cognitive Search and RAG pattern.
Language: Python - Size: 4.36 MB - Last synced at: 1 day ago - Pushed at: 13 days ago - Stars: 243 - Forks: 88

JinjieNi/MixEval
The official evaluation suite and dynamic data release for MixEval.
Language: Python - Size: 9.37 MB - Last synced at: 3 days ago - Pushed at: 5 months ago - Stars: 235 - Forks: 41

devmount/GermanWordEmbeddings
Toolkit to obtain and preprocess German text corpora, train models and evaluate them with generated testsets. Built with Gensim and Tensorflow.
Language: Jupyter Notebook - Size: 911 KB - Last synced at: 6 months ago - Pushed at: 8 months ago - Stars: 234 - Forks: 50

radarlabs/api-diff
A command line tool for diffing json rest APIs
Language: TypeScript - Size: 1.27 MB - Last synced at: 13 days ago - Pushed at: almost 3 years ago - Stars: 230 - Forks: 16

lgalke/vec4ir
Word Embeddings for Information Retrieval
Language: Python - Size: 965 KB - Last synced at: 10 days ago - Pushed at: over 1 year ago - Stars: 225 - Forks: 42

INGInious/INGInious
INGInious is a secure and automated exercises assessment platform using your own tests, also providing a pluggable interface with your existing LMS.
Language: Python - Size: 42.6 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 219 - Forks: 140

hpatches/hpatches-benchmark
Python & Matlab code for local feature descriptor evaluation with the HPatches dataset.
Language: MATLAB - Size: 234 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 218 - Forks: 64

zeno-ml/zeno 📦
AI Data Management & Evaluation Platform
Language: Svelte - Size: 51.6 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 215 - Forks: 11

kavgan/ROUGE-2.0
ROUGE automatic summarization evaluation toolkit. Support for ROUGE-[N, L, S, SU], stemming and stopwords in different languages, unicode text evaluation, CSV output.
Language: Java - Size: 154 MB - Last synced at: 21 days ago - Pushed at: about 5 years ago - Stars: 213 - Forks: 37

google/imageinwords
Data release for the ImageInWords (IIW) paper.
Language: JavaScript - Size: 21.4 MB - Last synced at: 1 day ago - Pushed at: 5 months ago - Stars: 209 - Forks: 9
