• we_are_mammalsOPB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    According to the scaling laws, the loss/error is approximated as

    w0 + w1 * pow(num_params, -w2) + w3 * pow(num_tokens, -w4)
    

    Bill wrote before that he’d been meeting with the OpenAI team since 2016, so he’s probably pretty knowledgeable about these things. He might be referring to the fact that, after a while, you will see very diminishing returns while increasing num_params. In the limit, the corresponding term disappears, but the others do not.