Yi-34B vs Yi-34B-200K on sequences <32K and <4K

DreamGenX · 1 年前

Yi-34B vs Yi-34B-200K on sequences <32K and <4K

mcmoose1900 · 1 年前

Random update on this, I did some more experimenting on the start of a story (with LimaRP and Petrol LoRAs), and the 4K model seems… fine? So does the 200K.

I don’t how know to stretch out the base model. Their page claims it supports 32K, but it has a 4K context in the config and no RoPE scaling section. Just a high rope theta.

The one difference I did notice is that the 200K model really likes to summarize and reference previous parts of the story. Maybe it was trained on retrieval or summarization examples.

Yi-34B vs Yi-34B-200K on sequences &lt;32K and &lt;4K

Yi-34B vs Yi-34B-200K on sequences &lt;32K and &lt;4K

Yi-34B vs Yi-34B-200K on sequences <32K and <4K

Yi-34B vs Yi-34B-200K on sequences <32K and <4K