What is Q* and how do we use it?

georgejrjrjr · 1 year ago

What is Q* and how do we use it?

georgejrjrjr · 1 year ago

Edits aren’t working for me somehow, here’s my update:

First, as I mentioned on twitter but failed to address here, this is at least excellent PR. So that may be all it is, basically a more sophisticated “AGI achieved internally” troll. I would suggest taking Q* discourse with all due salt.

From context and the description, it looks like OpenAI published about the technique in question here: https://openai.com/research/improving-mathematical-reasoning-with-process-supervision

The result is pretty unsurprising: given process supervision (i.e., help from a suitably accurate model of a particular process), models perform better.

Well…yeah. It’s probably an impactful direction for AI as people find ways to build good process models, but it isn’t an especially novel finding, nor is it a reason to blow up a company. This updates me further in the direction of, “Q* discourse was a brilliant PR move to capitalize off of the controversy and direct attention away from the board power struggle.”

Which doesn’t mean it can’t also be a good intuition pump for the open source world. Every big lab seems to be thinking about model-based supervision, it would be a little bit silly if we weren’t. So coming back to the original question:

How might we use this?

I think the question reduces to, “What means of supervision are available?”

Once you have a supervisor to play “warmer / colder” with the model, the rest is trivial.

I’m curious what models you all expect to come online to supervise llms. Arithmetic has already been reported. Code, too.

What else?

ninjasaid13 · 1 year ago

What’s so special about Q*

Xnohat · 1 year ago

Ilya from OpenAI have published a paper (2020) about Q* , a GPT-f model have capabilities in understand and resolve Mathhttps://arxiv.org/abs/2009.03393

ajibawa-2023 · 1 year ago

This video by David Shapiro explains very well about Q*: https://www.youtube.com/watch?v=T1RuUw019vA
I have good idea about RL but better to have in video format so that everyone can understand.

honestduane · 1 year ago

Q* was completely explained, and openAI explained what it was. I was even able to make a YouTube video about it because they’re explanation was so clear, so I was able to explain it as if you were five years old.

I don’t understand how people believe this is a secretive thing and I don’t understand why people aren’t talking about how simple it is.

Everybody is talking about this like it’s some grand secret, why?

I mean, the algorithm is expensive to run, but it’s not that hard to understand.

Can somebody please explain why everybody’s acting like this is such a big secret thing?

sprectza · 1 year ago

Yeah I think its MCTS reinforcement learning algorithm. I think DeepMind is the best lab when it comes to depeloping strategy and planning capable agents, given how good AlphaZero and AlphaGo is, and if they integrate it with the “Gemini” project, they really might just “ecliplse” GPT-4. I don’t know how scalable it would be in terms of inference given the amount of compute required.

olddoglearnsnewtrick · 1 year ago

It’s a silicon based version of Qanon. I will be terminated by telling you but wait 'till they launch MAGA (Machine Augmented General AI) !!!

wind_dude · 1 year ago

is there something other than the letter Q making you think it’s Q-learning?