• PookaMacPhellimenOPB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    https://github.com/QwenLM/Qwen

    Also released was a 1.8B model.

    From Bunyan Hui’s Twitter announcement:

    “We are proud to present our sincere open-source works: Qwen-72B and Qwen-1.8B! Including Base, Chat and Quantized versions!

    🌟 Qwen-72B has been trained on high-quality data consisting of 3T tokens, boasting a larger parameter scale and more training data to achieve a comprehensive performance upgrade. Additionally, we have expanded the context window length to 32K and enhanced the system prompt capability, allowing users to customize their own AI assistant with just a single prompt.

    🎁 Qwen-1.8B is our additional gift to the research community, striking a balance between maintaining essential functionalities and maximizing efficiency, generating 2K-length text content with just 3GB of GPU memory.

    We are committed to continuing our dedication to the open-source community and thank you all for your enjoyment and support! 🚀 Finally, Happy 1st birthday ChatGPT. 🎂 “

    • candre23B
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      we have expanded the context window length to 32K

      Kinda buried the lead here. This is far and away the biggest feature of this model. Here’s hoping it’s actually decent as well!

    • a_slay_nubB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      Bit disappointed by the coding performance but it is a general use case model. It’s insane how good gpt 3.5 is for how fast it is.

    • Secret_Joke_2262B
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      What do these tests mean for LLM? There are many values, and I see that in most cases qwen is better than gpt4. In others it is worse or much worse

  • Secret_Joke_2262B
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Now everyone is most interested in how much better it is than 70b llama

  • a_beautiful_rhindB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Heh, 72b with 32k and GQA seems reasonable. Will make for interesting tunes if it’s not super restricted.

  • QuieselWusulB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Why did so many new chinese 70b foundation models release in a day? (this one, Deepseek, XVERSE) Is there any reason they all released in such a short time?

  • ambient_temp_xenoB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    The first thing I looked for was the number of training tokens. I think yi34 got a lot of benefit from 3 trillion, so this model having 3 trillion bodes well.

  • Wonderful_Ad_5134B
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    If the US keeps going full woke and are too afraid to work as hard as possible on the LLM ecosystem, China won’t wait twice before winning this battle (which is basically the 21th century battle in terms of technology)

    Feels sad to see the US decline like that…