• Longjumping-Bake-557B
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    And that’s on a die just slightly bigger than the 4090. Unless they increased the size compared to h100?

  • AaaaaaaaaeeeeeB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    (With a massive batch size*)

    Its would be better if they provide single batch information for normal inference on fp8.

    People look at this and think its astonishing, but will compare this with single batch performances as that’s all they have seen before.

  • a_beautiful_rhindB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    70b with 2048 context and 128 reply is about 303 t/s.

    That sounds more reasonable. And assuming they aren’t quantized. The batch size is just theoretical batch I think.

  • jun2sanB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    How much you want for your old H100? - me to ai devs