It’s a new foundational model, so some teething pains are to be expected. Yi is heavily based on (directly copied, for the most part) llama2, but there are just enough differences in the training parameters that default llama2 settings don’t get good results. KCPP has already addressed the rope scaling, and I’m sure it’s only a matter of time before the other issues are hashed out.
It’s a new foundational model, so some teething pains are to be expected. Yi is heavily based on (directly copied, for the most part) llama2, but there are just enough differences in the training parameters that default llama2 settings don’t get good results. KCPP has already addressed the rope scaling, and I’m sure it’s only a matter of time before the other issues are hashed out.