Looking for some good prompts to get an idea of just how smart a model is.

With constant new releases, it’s not always feasible to sit there and have a long conversation, although that is the route I generally prefer.

Thanks in advance.

  • naptasticB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    It’s important that we not disclose all our test questions, or models will continue to overfit and underlearn. Now, to answer your question:

    When evaluating a code model, I look for questions with easy answers, then tweak them slightly to see if the model gives the easy answer or figures out that I need something else. I’ll give one example out of tens*:

    “Write a program that removes the first 1 KiB of a file.”

    Most of the models I’ve tested will give a correct answer to the wrong question: seek(1024) and truncate(). That removes everything after the first 1 KiB of the file.

    (*I’m being deliberately vague about how many questions I have for the same reason I don’t share them. Also it’s a moving target.)