Illustrious_Sand6784B to

LocalLLaMA@poweruser.forumEnglish · 2 years ago

Two sets of base models from China (Yuan 2.0-2B, 51B, 102B and XVERSE-7B, 13B, 65B)

4

1

Two sets of base models from China (Yuan 2.0-2B, 51B, 102B and XVERSE-7B, 13B, 65B)

Illustrious_Sand6784B to

LocalLLaMA@poweruser.forumEnglish · 2 years ago

4

Didn’t see any posts about these models so I made one myself.

This first set of models was trained with 288B high quality tokens, will be interesting if the 51B and 102B models hold up. Commercial use is allowed with no authorization.

https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/README-EN.md

(Chinese) https://github.com/IEIT-Yuan/Yuan-2.0

Paper: https://arxiv.org/abs/2311.15786

Huggingface download links

https://huggingface.co/pandada8/Unofficial-Yuan-2.0-2B

https://huggingface.co/pandada8/Unofficial-Yuan-2.0-51B

https://huggingface.co/pandada8/Unofficial-Yuan-2.0-102B

Here’s the second set of models I found. 7B and 65B were trained with 2.6T tokens, and the 13B with 3.2T. The 65B model supports up to 16K context, while the two smaller ones support up to 8K.

https://huggingface.co/xverse/XVERSE-65B

https://huggingface.co/xverse/XVERSE-13B

https://huggingface.co/xverse/XVERSE-7B

These models know 40 over human languages plus several programming languages too. Commercial use is allowed, but you have to submit an application form.

Chat

Dead_Internet_TheoryB
link
fedilink
English
arrow-up
1·
2 years ago
>XVERSE 7B, 13B and 65B
Either they are ripping off Meta and not telling us about it, or there’s some reason why the ~30B parameter models are being ignored. It’s the perfect size for a 24GB card! Bummer.