Starling-RM-7B-alpha: New RLAIF Finetuned 7b Model beats Openchat 3.5 and comes close to GPT-4

Legcor · 2 years ago

Starling-RM-7B-alpha: New RLAIF Finetuned 7b Model beats Openchat 3.5 and comes close to GPT-4

LocoMod · 2 years ago

Quantz are up:

https://huggingface.co/TheBloke/Starling-LM-7B-alpha-GGUF/tree/main

metalman123 · 2 years ago

Was wondering how long this would take to show up.

bot-333 · 2 years ago

“New RLAIF Finetuned 7b Model” Interesting. “beats Openchat 3.5” Nice! “and comes close to GPT-4” Bruh.

Evening_Ad6637 · 2 years ago

heheh i can’t read that any more… i really have become very prejudiced when comes to that… to be honest, when it comes to any comparison with GPT-4.

People have really to understand that even GPT-4 has been aligned, lobotomized and it has been massively downgraded in terms of its perfomance – due to security reasons (what is understandable for me), but anyway this thing still is an absolute beast. if we consider all the restrictions GPT-4 has to undergo, all the smartness at openAI, all the ressources at microsoft and so on, we have to realize that currently nothing is really comparable to GPT-4. Especially not 7B models.

noeda · 2 years ago

I’ve seen the “… beats GPT-4” enough times that now whenever I see a title that suggests a tiny model can compete with GPT-4 I see it as a negative signal; that the authors are bullshitting through some benchmarks or some other shenanigans.

It’s annoying because the models might be legitimately good models for being open and within their weight class but now you’ve put my brain in BS detecting mode and I can’t trust you’ve done good faith measurement anymore.

bot-333 · 2 years ago

There are SO many models “bullshitting through some benchmarks or some other shenanigans” that I’m cooking my own benchmark system LOL.

Evening_Ad6637 · 2 years ago

Yeah I dont think authors are intentionally bullshitting or intentionally doing “benchmark cosmetics”, but maybe it’s more lack of knowledge on whats going on in terms of (most of) benchmarks and their the image that has become ruined in the meantime.

noeda · 2 years ago

The first image posted; looks like it’s not even close to GPT-4?

georgejrjrjr · 2 years ago

If there is something somehow inherently superior about having a separate reward model, that should be teased out.

It would be nice to see stronger baselines / ablations for this reason. I realize it’s nigh impossible to keep up with the unrelenting pace of advances, so I don’t fault the authors here. That said, if there isn’t a compelling reason to keep the separate preference model, community people-hours will probably be best spent sticking with DPO/IPO to avoid the hyper-parameter tuning rabbit hole.

My guess: the way things are going, we’ll soon see a rough consensus emerge around a sane default DPO or Identity-PO recipe for fine-tunes (the same way we’ve seen gradual convergence around decoder-only transformer + rotational positional embeddings + group query attention + FlashAttention 2) to be applied absent a compelling reason to use a different reward signal.

No matter what, preference datasets like this are helpful. Pity about the license being claimed here, it’s hard to imagine it would hold up, but the specter is a bit of a hindrance.

Thistleknot · 2 years ago

rm is the reward model… not the same as the lm model. I tried the lm, wasn’t impressed. Gpt-3.5 did better for summarizing quotes. It was good, but I honestly think open hermes and or synthia 1.3b do better

OC2608 · 2 years ago

How to earn VC money 101: “Beats GPT-4!”

And voila! you’re rich now.

sahil1572 · 2 years ago

Every other model nowadays claims to be GPT-4, and they turn out to be < GPT-3. I don’t know what kind of test they use to score .

sahil1572 · 2 years ago

LOL GPT4

https://preview.redd.it/fy2rvgg8v13c1.png?width=1754&format=png&auto=webp&s=8df41b305a0d01be335f406a204b1061ca24b658

Wonderful_Ad_5134 · 2 years ago

“Close to GPT4” is as true as “Me, Close to Usain bolt in the 100m dash” lol