Reuters is reporting that OpenAI achieved an advance with a technique called Q* (pronounced Q-Star).

So what is Q*?

I asked around the AI researcher campfire and…

It’s probably Q Learning MCTS, a Monte Carlo tree search reinforcement learning algorithm.

Which is right in line with the strategy DeepMind (vaguely) said they’re taking with Gemini.

Another corroborating data-point: an early GPT-4 tester mentioned on a podcast that they are working on ways to trade inference compute for smarter output. MCTS is probably the most promising method in the literature for doing that.

So how do we do it? Well, the closest thing I know of presently available is Weave, within a concise / readable Apache licensed MCTS lRL fine-tuning package called minihf.

https://github.com/JD-P/minihf/blob/main/weave.py

I’ll update the post with more info when I have it about q-learning in particular, and what the deltas are from Weave.

  • honestduaneB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Q* was completely explained, and openAI explained what it was. I was even able to make a YouTube video about it because they’re explanation was so clear, so I was able to explain it as if you were five years old.

    I don’t understand how people believe this is a secretive thing and I don’t understand why people aren’t talking about how simple it is.

    Everybody is talking about this like it’s some grand secret, why?

    I mean, the algorithm is expensive to run, but it’s not that hard to understand.

    Can somebody please explain why everybody’s acting like this is such a big secret thing?