Data Contamination in Multi-choice QA Benchmarks
How much test samples are included in Llama’s training data?
This presented how much test samples in popular Multi-Choise QA benchmarks are included in the training data of Llama models (Common Crawl 2017–2020).
Three types of data contamination: input-only contamination, input-and-label contamination, and all contamination containing both.
Input-only contamination represents contaminations where only input part of test samples was included in the training data. On the contrary, input-and-label contamination indicate both input and the answer were included in the training data.
Impact on Model Performance
How much data contamination affects model evaluation?
The full open sourced data contamination report: https://arxiv.org/abs/2310.17589
All data and code: https://github.com/liyucheng09/Contamination_Detector