So RWKV 7b v5 is 60% trained now, saw that multilingual parts are better than mistral now, and the english capabilities are close to mistral, except for hellaswag and arc, where its a little behind. all the benchmarks are on rwkv discor, and you can google the pro/cons of rwkv, though most of them are v4.

Thoughts?

    • ambient_temp_xenoB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Seems amazingly good. I might get a real use out of a raspberry pi after all.

    • MoffKalastB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Well it seems a lot better at Slovenian than LLamas or Mistral, especially for a 3B model, although it mostly just rambles about stuff that’s vaguely related to the prompt and makes lots of grammatical mistakes. The 7B one ought to be interesting once it’s done.

      • vatsadevOPB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Its trained on 100+ languages, the focus is multilingual

        • alchemist1e9B
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          Will that make it a good translator? I remember seeing somewhere a 400+ language translation model but not an LLM somewhere. Wonder what the best many language open source fast high quality translation solutions might look like.

  • AaaaaaaaaeeeeeB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Would the amount of RAM used at the end of 16k or 32k compared to mistral be less?

    Is the t/s the same speed as during the beginning?

    Looks like something to test in kobold.cpp later if nobody has done those tests yet.

    • AaaaaaaaaeeeeeB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      RWKV-4 7b does not increase any RAM usage with --nommap at 13k with koboldcpp. is that normal? Is there no kv-cache and no extra ram usage for context?

    • vatsadevOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Thats the point of rwkv, you could have a 10 mil contx len and it would be the same as 100 ctx len