So RWKV 7b v5 is 60% trained now, saw that multilingual parts are better than mistral now, and the english capabilities are close to mistral, except for hellaswag and arc, where its a little behind. all the benchmarks are on rwkv discor, and you can google the pro/cons of rwkv, though most of them are v4.
Thoughts?
Would the amount of RAM used at the end of 16k or 32k compared to mistral be less?
Is the t/s the same speed as during the beginning?
Looks like something to test in kobold.cpp later if nobody has done those tests yet.
RWKV-4 7b does not increase any RAM usage with --nommap at 13k with koboldcpp. is that normal? Is there no kv-cache and no extra ram usage for context?
Thats the point of rwkv, you could have a 10 mil contx len and it would be the same as 100 ctx len