An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: humaneval

abacaj/code-eval

Run evaluation on LLMs using human-eval benchmark

Language: Python - Size: 110 KB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 410 - Forks: 36

the-crypt-keeper/can-ai-code

Self-evaluating interview for AI coders

Language: Python - Size: 8.49 MB - Last synced at: 3 days ago - Pushed at: 14 days ago - Stars: 577 - Forks: 34

bin123apple/AutoCoder

We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024) and GPT-4o.

Language: Python - Size: 25.8 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 842 - Forks: 71

zorse-project/COBOLEval

Evaluate LLM-generated COBOL

Language: Python - Size: 140 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 35 - Forks: 2

mennahasan31/llm_benchmark

llm_benchmark is a comprehensive benchmarking tool for evaluating the performance of various Large Language Models (LLMs) on a range of natural language processing tasks. It provides a standardized framework for comparing different models based on accuracy, speed, and efficiency.

Size: 1000 Bytes - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

declare-lab/LLM-ReasoningTest

Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions

Language: Python - Size: 1.78 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 9 - Forks: 1

abhigupta2909/LLMPerformanceLab

LLMs' performance analysis on CPU, GPU, Execution Time and Energy Usage

Language: Java - Size: 11.5 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

mousamax/Evaluation-Code-Generator-LLMs

JetBrains Task: Leveraging software evolution data with LLMs

Size: 2.93 KB - Last synced at: 7 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

SkyWorkAIGC/SkyCode-AI-CodeX-GPT3

SkyCode是一个多语言开源编程大模型,采用GPT3模型结构,支持Java, JavaScript, C, C++, Python, Go, shell等多种主流编程语言,并能理解中文注释。模型可以对代码进行补全,拥有强大解题能力,使您从编程中解放出来,专心于解决更重要的问题。| SkyCode is an open source programming model, which adopts the GPT3 model structure. It supports Java, JavaScript, C, C++, Python, Go, shell and other languages, and can understand Chinese comments.

Size: 34.2 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 388 - Forks: 21

talmago/30-seconds-of-code-eval Fork of openai/human-eval

Code evaluation with *30-seconds-of-code* examples. Inspired by "Evaluating Large Language Models Trained on Code"

Language: Python - Size: 693 KB - Last synced at: 8 months ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0