Extra proof (IMO) that HumanEval is leaked in base models?

kpodkanowicz · 1 year ago

Extra proof (IMO) that HumanEval is leaked in base models?

kpodkanowicz · 1 year ago

HumanEval is 164 function declarations and corresponding docstrings, and evaluation happens by set of unit tests while code is running in docker. Extra is coming from HumanEvalPlus that added several unit tests per each on the top.

Merging models might improve its capabilities, but this one was not able to find out of bounds of wrongly declared vector - there is no chance it magically is able to complete complex python code on the level that is basically on GPT4 level