tenmileswideB to

LocalLLaMA@poweruser.forumEnglish · 1 year ago

Is there a way to prevent coherency degradation when using high levels of RoPE scaling?

5

1

Is there a way to prevent coherency degradation when using high levels of RoPE scaling?

tenmileswideB to

LocalLLaMA@poweruser.forumEnglish · 1 year ago

5

For the sake of argument, let’s say VRAM is no object.

If I set alpha_value to around 2.5 to 3 when loading a normally 4k base context model, I can get up to about 10k context before things start noticeably falling apart. Extending context higher than this, even if I increase alpha_value higher to go along with it, the model gets progressively less coherent.

I’ve found that I can attenuate this a little bit by messing around with different alpha values at different context loads, but it never really gets usable. It gets closer to where it needs to be, but still nothing I’d actually want to run.

Is this just the nature of the beast when it comes to extending context?

Chat

mcmoose1900B
link
fedilink
English
arrow-up
1·
1 year ago
Have you considered running a Yi 200K model instead?