ThistleknotB to LocalLLaMA@poweruser.forumEnglish · 2 年前The Q* hypothesis: Tree-of-thoughts reasoning, process reward models, and supercharging synthetic datamessage-squaremessage-square8linkfedilinkarrow-up11arrow-down10file-text
arrow-up11arrow-down1message-squareThe Q* hypothesis: Tree-of-thoughts reasoning, process reward models, and supercharging synthetic dataThistleknotB to LocalLLaMA@poweruser.forumEnglish · 2 年前message-square8linkfedilinkfile-text