What is Q* and how do we use it?

georgejrjrjr · 1 year ago

What is Q* and how do we use it?

georgejrjrjr · 1 year ago

Edits aren’t working for me somehow, here’s my update:

First, as I mentioned on twitter but failed to address here, this is at least excellent PR. So that may be all it is, basically a more sophisticated “AGI achieved internally” troll. I would suggest taking Q* discourse with all due salt.

From context and the description, it looks like OpenAI published about the technique in question here: https://openai.com/research/improving-mathematical-reasoning-with-process-supervision

The result is pretty unsurprising: given process supervision (i.e., help from a suitably accurate model of a particular process), models perform better.

Well…yeah. It’s probably an impactful direction for AI as people find ways to build good process models, but it isn’t an especially novel finding, nor is it a reason to blow up a company. This updates me further in the direction of, “Q* discourse was a brilliant PR move to capitalize off of the controversy and direct attention away from the board power struggle.”

Which doesn’t mean it can’t also be a good intuition pump for the open source world. Every big lab seems to be thinking about model-based supervision, it would be a little bit silly if we weren’t. So coming back to the original question:

How might we use this?

I think the question reduces to, “What means of supervision are available?”

Once you have a supervisor to play “warmer / colder” with the model, the rest is trivial.

I’m curious what models you all expect to come online to supervise llms. Arithmetic has already been reported. Code, too.

What else?