Looking for some good prompts to get an idea of just how smart a model is.
With constant new releases, it’s not always feasible to sit there and have a long conversation, although that is the route I generally prefer.
Thanks in advance.
Looking for some good prompts to get an idea of just how smart a model is.
With constant new releases, it’s not always feasible to sit there and have a long conversation, although that is the route I generally prefer.
Thanks in advance.
It’s important that we not disclose all our test questions, or models will continue to overfit and underlearn. Now, to answer your question:
When evaluating a code model, I look for questions with easy answers, then tweak them slightly to see if the model gives the easy answer or figures out that I need something else. I’ll give one example out of tens*:
Most of the models I’ve tested will give a correct answer to the wrong question: seek(1024) and truncate(). That removes everything after the first 1 KiB of the file.
(*I’m being deliberately vague about how many questions I have for the same reason I don’t share them. Also it’s a moving target.)