What prompts/questions do you use to test a model’s capabilities? Ideally ones that aren’t included in their training data.

sardoa11 · 1 year ago

What prompts/questions do you use to test a model’s capabilities? Ideally ones that aren’t included in their training data.

naptastic · 1 year ago

It’s important that we not disclose all our test questions, or models will continue to overfit and underlearn. Now, to answer your question:

When evaluating a code model, I look for questions with easy answers, then tweak them slightly to see if the model gives the easy answer or figures out that I need something else. I’ll give one example out of tens*:

“Write a program that removes the first 1 KiB of a file.”

Most of the models I’ve tested will give a correct answer to the wrong question: seek(1024) and truncate(). That removes everything after the first 1 KiB of the file.

(*I’m being deliberately vague about how many questions I have for the same reason I don’t share them. Also it’s a moving target.)