GitHub topics: pretraining-data-detection
zjysteven/mink-plus-plus
[ICLR'25 Spotlight] Min-K%++: Improved baseline for detecting pre-training data of LLMs
Language: Python - Size: 3.81 MB - Last synced at: 5 days ago - Pushed at: 22 days ago - Stars: 39 - Forks: 7

cisnlp/multilingual-fact-tracing
Tracing Multilingual Factual Knowledge Acquisition in Pretraining
Language: Python - Size: 531 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 3 - Forks: 0

tsinghua-fib-lab/AAAI2025_MIA-Tuner
[AAAI'25 Oral] "MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector".
Language: Python - Size: 28.9 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 142 - Forks: 7

SrulyRosenblat/Detecting-Pretraining-Data-Using-Probability-Slopes
A new method for recognizing text that is included in an LLM's training data.
Language: Jupyter Notebook - Size: 1.38 MB - Last synced at: 5 days ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0
