Bill wrote before that he’d been meeting with the OpenAI team since 2016, so he’s probably pretty knowledgeable about these things. He might be referring to the fact that, after a while, you will see very diminishing returns while increasing num_params. In the limit, the corresponding term disappears, but the others do not.
According to the scaling laws, the loss/error is approximated as
Bill wrote before that he’d been meeting with the OpenAI team since 2016, so he’s probably pretty knowledgeable about these things. He might be referring to the fact that, after a while, you will see very diminishing returns while increasing
num_params
. In the limit, the corresponding term disappears, but the others do not.