GitHub topics: data-contamination
lyy1994/awesome-data-contamination
The Paper List on Data Contamination for Large Language Models Evaluation.
Size: 106 KB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 92 - Forks: 3

mravanelli/pySpeechRev
This python code performs an efficient speech reverberation starting from a dataset of close-talking speech signals and a collection of acoustic impulse responses.
Language: Python - Size: 2.12 MB - Last synced at: 29 days ago - Pushed at: almost 5 years ago - Stars: 95 - Forks: 25

nlx-group/overlapy
Python package developed to evaluate textual overlap (N-Grams) between two volumes of text.
Language: Python - Size: 44.9 KB - Last synced at: 9 days ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 2

THU-KEG/DICE
DICE: Detecting In-distribution Data Contamination with LLM's Internal State
Language: Python - Size: 3.26 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 5 - Forks: 0

shahriargolchin/DCQ
The official repository for the paper entitled "Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models."
Language: Python - Size: 1.35 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

shahriargolchin/time-travel-in-llms
The official repository for the paper titled "Time Travel in LLMs: Tracing Data Contamination in Large Language Models."
Language: Python - Size: 848 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 3 - Forks: 2

yyy01/PAC
The official implementation of the paper "Data Contamination Calibration for Black-box LLMs" (ACL 2024)
Language: Python - Size: 210 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 7 - Forks: 0
