So there detect pretrain data, https://swj0419.github.io/detect-pretrain.github.io/ , where one can test if a model has been pretrained on the text or not, so why dont we just test all the models going on the leaderboard, and just reject those detected for pretrain data? It would end the “train on test” issue

  • ninjasaid13B
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    It’s all a rabbit hole of time wasting, imho. People judge x or y model on how well it works for their use cases.

    Well people don’t want to be falsely advertised on the capabilities of the model, if it’s only good on certain use cases, then just say it.