So RWKV 7b v5 is 60% trained now, saw that multilingual parts are better than mistral now, and the english capabilities are close to mistral, except for hellaswag and arc, where its a little behind. all the benchmarks are on rwkv discor, and you can google the pro/cons of rwkv, though most of them are v4.
Thoughts?
I tested the 3B model and it looks good, especially the multilingual part (demo https://huggingface.co/spaces/BlinkDL/RWKV-Gradio-2)
Seems amazingly good. I might get a real use out of a raspberry pi after all.
Well it seems a lot better at Slovenian than LLamas or Mistral, especially for a 3B model, although it mostly just rambles about stuff that’s vaguely related to the prompt and makes lots of grammatical mistakes. The 7B one ought to be interesting once it’s done.
Its trained on 100+ languages, the focus is multilingual
Will that make it a good translator? I remember seeing somewhere a 400+ language translation model but not an LLM somewhere. Wonder what the best many language open source fast high quality translation solutions might look like.
Would the amount of RAM used at the end of 16k or 32k compared to mistral be less?
Is the t/s the same speed as during the beginning?
Looks like something to test in kobold.cpp later if nobody has done those tests yet.
RWKV-4 7b does not increase any RAM usage with --nommap at 13k with koboldcpp. is that normal? Is there no kv-cache and no extra ram usage for context?
Thats the point of rwkv, you could have a 10 mil contx len and it would be the same as 100 ctx len
Fully open source?