GitHub topics: pretraining-data-detection
zjysteven/mink-plus-plus
[ICLR'25 Spotlight] Min-K%++: Improved baseline for detecting pre-training data of LLMs
Language: Python - Size: 3.8 MB - Last synced at: 4 days ago - Pushed at: 3 months ago - Stars: 37 - Forks: 5

tsinghua-fib-lab/AAAI2025_MIA-Tuner
[AAAI'25 Oral] "MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector".
Language: Python - Size: 28.9 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 142 - Forks: 7

SrulyRosenblat/Detecting-Pretraining-Data-Using-Probability-Slopes
A new method for recognizing text that is included in an LLM's training data.
Language: Jupyter Notebook - Size: 1.38 MB - Last synced at: 4 days ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0
