You must log in or register to comment.
According to the scaling laws, the loss/error is approximated as
w0 + w1 * pow(num_params, -w2) + w3 * pow(num_tokens, -w4)
Bill wrote before that he’d been meeting with the OpenAI team since 2016, so he’s probably pretty knowledgeable about these things. He might be referring to the fact that, after a while, you will see very diminishing returns while increasing
num_params
. In the limit, the corresponding term disappears, but the others do not.If it’s half of the improvement from 3.5 to 4 that’s good enough for me
Just because Bill Gates says something doesn’t mean that it’s true
Not terribly surprised.