• a_beautiful_rhindB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    70b with 2048 context and 128 reply is about 303 t/s.

    That sounds more reasonable. And assuming they aren’t quantized. The batch size is just theoretical batch I think.