An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: data-contamination

lyy1994/awesome-data-contamination

The Paper List on Data Contamination for Large Language Models Evaluation.

Size: 106 KB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 92 - Forks: 3

mravanelli/pySpeechRev

This python code performs an efficient speech reverberation starting from a dataset of close-talking speech signals and a collection of acoustic impulse responses.

Language: Python - Size: 2.12 MB - Last synced at: 29 days ago - Pushed at: almost 5 years ago - Stars: 95 - Forks: 25

nlx-group/overlapy

Python package developed to evaluate textual overlap (N-Grams) between two volumes of text.

Language: Python - Size: 44.9 KB - Last synced at: 9 days ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 2

THU-KEG/DICE

DICE: Detecting In-distribution Data Contamination with LLM's Internal State

Language: Python - Size: 3.26 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 5 - Forks: 0

shahriargolchin/DCQ

The official repository for the paper entitled "Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models."

Language: Python - Size: 1.35 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

shahriargolchin/time-travel-in-llms

The official repository for the paper titled "Time Travel in LLMs: Tracing Data Contamination in Large Language Models."

Language: Python - Size: 848 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 3 - Forks: 2

yyy01/PAC

The official implementation of the paper "Data Contamination Calibration for Black-box LLMs" (ACL 2024)

Language: Python - Size: 210 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 7 - Forks: 0