LocalLLaMA@poweruser.forumEnglish · 1 year ago

Yi-34B vs Yi-34B-200K on sequences <32K and <4K

1

Yi-34B vs Yi-34B-200K on sequences <32K and <4K

LocalLLaMA@poweruser.forumEnglish · 1 year ago

Hello!

By popular demand I am planning a fine-tune of https://huggingface.co/dreamgen/opus-v0-7b on top of Yi-34B and wonder whether to use the 200K as the base.

The regular Yi-34B seems slightly better than Yi-34B-200K on standard benchmarks, but I wonder how it “feels” and whether the loss of performance on short context is worth it, given that the regular version can be used up to 32K tokens.

(Yi-34B vs Yi-34B-200K)

Did anyone try an analysis of these 2 models on various sequence lengths (<4K, <8K, <16K, etc.)?

Chat

mcmoose1900B
link
fedilink
English
arrow-up
1·
1 year ago
I am running my story on 200K, feels the same as 4K to me (which I tried in the same setting before 200K was released).

And honestly… Even if it is much worse (and I dont think it worse at all), the mega context is such a boon for storytelling.

What I did not try was 4K stretched out with RoPE alpha or anything like that, but the 200K model does not need any stretching out to at least 42K.