Finally, a diffusion based LMM!

BalorNG · 1 year ago

EXTRERMINATE!

BalorNG · 1 year ago

I say:

It has a performance hit, but it remains to be seen if going with a much larger model can compensate for that.
The model needs to be trained from scratch, you cannot finetune an existing model for this apparently…

BalorNG · 1 year ago

I mean, you can jailbreak/browbeat chatgpt/Claude into going against guardrails relatively easily, I smash “X” for doubt that Grok is going to be any different. If it will, now THAT is going to huge, if not in a way we’d like to I guess…

BalorNG · 1 year ago

That explains why Goliath worked and yours - not so much, I guess…

BalorNG · 1 year ago

“Prompt Template: Alpeca” Wut?

Looks like a scam to be fair. I bet if you apply, you’ll get “Just send us 100$ for access!”

BalorNG · 1 year ago

Did you do post-merge retraining? Without at least some results are going to be poor…

BalorNG · 1 year ago

Did you do post-merge training and how much?

BalorNG · 1 year ago

10s/tok and couple kilowatts of power… ok, if it was as smart as Einstein and as unerring as an Oracle it might make sense, but you can use it for free at Petals at 3 tok/sec and it is most certainly not…

BalorNG · 1 year ago

Technically, you can somewhat automate the testing process by creating a script that makes that model aswer a series of questions that are relevant to YOU and are unique (so cannot be gamed by training for benchmarks) and evaluate those yourself.

Make sure you experiment using different sampling methods and run several tests due to inherent randomness of output.

BalorNG · 1 year ago

Please dear Tzeench, have someone leak gpt4 in general confusion, I MUST know if this is really 10 7b models in a trench coat :)

BalorNG · 1 year ago

My name is Mensch. Uber Mensch.

BalorNG · 1 year ago

A quick question: does it imply that it has 160 layers as a result? Afaik, Falcon has 80 layers (like Llama), and original GPT3 had 96. “Stack more layers” ©

Sooo… is “stacking 1000 phi 1.3b together” is a recipe for AGI? :)

BalorNG · 1 year ago

He MUST become a CEO of Uber, too! :))))

BalorNG · 1 year ago

Yea, I’ve had my “honeymoon effect” with some new/large models like, say, Falcon and even Claude: they are inherently random and that affects quality, too. I’ve had great outputs from Falcon, for instance (on Petals), but also long stretches of mediocre and some outright bad… and also sometimes really great and creative output from 7b Mistral, especially with enough prompt tinkering and setting sampling “just right”. Objective evaluations of LMMs is extremely hard and time-consuming!

BalorNG · 1 year ago

Can we have some non-cherry-picked examples of writing?

Does not have to be highly nsfw/whatever, but a comparison of goliath writing compared to output from constituent models at same settings and same (well-crafted) prompts will be very interesting to see, and preferably at least 3 examples per model due to inherent randomness of model output…

If you say this is “night and day” difference, it should be apparent… I’m not sceptical per se, but “writing quality” is highly subjective and the model style may simply mesh better with your personal preferences?

BalorNG · 1 year ago

There is no way it has “undiluted” 100k context. https://news.ycombinator.com/item?id=36374936

But yea, it IS impressive.

BalorNG · 1 year ago

Given how good 7b Mistral is in my personal experience, it seems that a model 3x its size can BE GPT3.5 Turbo is no longer implausible.

BalorNG · 1 year ago

Finally, a diffusion based LMM!