GitHub topics: humaneval

Repositories

abacaj/code-eval

Run evaluation on LLMs using human-eval benchmark

Language: Python - Size: 110 KB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 410 - Forks: 36

the-crypt-keeper/can-ai-code

Self-evaluating interview for AI coders

Language: Python - Size: 8.49 MB - Last synced at: 3 days ago - Pushed at: 14 days ago - Stars: 577 - Forks: 34

bin123apple/AutoCoder

We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024) and GPT-4o.

Language: Python - Size: 25.8 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 842 - Forks: 71

zorse-project/COBOLEval

Evaluate LLM-generated COBOL

Language: Python - Size: 140 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 35 - Forks: 2

llm_benchmark is a comprehensive benchmarking tool for evaluating the performance of various Large Language Models (LLMs) on a range of natural language processing tasks. It provides a standardized framework for comparing different models based on accuracy, speed, and efficiency.

Size: 1000 Bytes - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

declare-lab/LLM-ReasoningTest

Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions

Language: Python - Size: 1.78 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 9 - Forks: 1

abhigupta2909/LLMPerformanceLab

LLMs' performance analysis on CPU, GPU, Execution Time and Energy Usage

Language: Java - Size: 11.5 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

mousamax/Evaluation-Code-Generator-LLMs

JetBrains Task: Leveraging software evolution data with LLMs

Size: 2.93 KB - Last synced at: 7 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

SkyWorkAIGC/SkyCode-AI-CodeX-GPT3

SkyCode是一个多语言开源编程大模型，采用GPT3模型结构，支持Java, JavaScript, C, C++, Python, Go, shell等多种主流编程语言，并能理解中文注释。模型可以对代码进行补全，拥有强大解题能力，使您从编程中解放出来，专心于解决更重要的问题。| SkyCode is an open source programming model, which adopts the GPT3 model structure. It supports Java, JavaScript, C, C++, Python, Go, shell and other languages, and can understand Chinese comments.

Size: 34.2 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 388 - Forks: 21

talmago/30-seconds-of-code-eval Fork of openai/human-eval

Code evaluation with *30-seconds-of-code* examples. Inspired by "Evaluating Large Language Models Trained on Code"

Language: Python - Size: 693 KB - Last synced at: 8 months ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

Related Keywords

humaneval 10 llm 3 gpt-3 2 javascript 2 java 2 reasoning 2 openai 2 evaluation 2 nlp 2 code-generation 2 code-interpreter 1 llms 1 mmlu 1 mysql 1 ollama-api 1 reactjs 1 spring-boot 1 huggingface 1 refact 1 alphacode 1 codeparrot 1 codex 1 deepmind 1 go 1 wizardcoder 1 gpt-neo 1 gpt3 1 polycoder 1 python 1 shell 1 nlp-machine-learning 1 text-generation 1 cobol 1 transformers 1 ai-tools 1 alibaba 1 anthropic 1 benchmark 1 evals 1 evaluation-metrics 1 information-seeking 1 mistral 1 llama-cpp 1 langchain 1 streetfighterai 1 gsm8k 1 flask-restful 1 ggml 1 ai 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos