Why is no one releasing 70b models?

Longjumping-Bake-557 · 1 year ago

Why is no one releasing 70b models?

DataLearnerAI · 1 year ago

Ali opensouced a 72B model called Qwen-72B: Qwen/Qwen-72B · Hugging Face

It supports Chinese and English. The performance on MMLU is remarkable.

ambient_temp_xeno · 1 year ago

Orca still memeing strong.

candre23 · 1 year ago

It’s adorable that you think any 13b model is anywhere close to a 70b llama2 model.

obvithrowaway34434 · 1 year ago

Mistral has already shown that it’s mostly about the data rather than the model. So why waste loads of money and time on training something that no average consumer can run locally?

a_beautiful_rhind · 1 year ago

What do you mean? Someone just posted 100,200 and 600b models and several 120b models have released past couple of weeks.

Slimxshadyx · 1 year ago

Those models can’t be accessed, they say it’s “too dangerous to be released”

WaterPecker · 1 year ago

Who pays for all this training on all these models we see knocking about and I don’t mean the ones released by the big companies? Like who has the resources to train a 70b model? Like one of the guys below said 1.7 million GPU hours for example thats pretty friggin expensive no?