rihard7854B to LocalLLaMA@poweruser.forumEnglish · 1 year agoNVidia H200 achieves nearly 12,000 tokens/sec on Llama2-13B with TensorRT-LLMgithub.comexternal-linkmessage-square24fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkNVidia H200 achieves nearly 12,000 tokens/sec on Llama2-13B with TensorRT-LLMgithub.comrihard7854B to LocalLLaMA@poweruser.forumEnglish · 1 year agomessage-square24fedilink
minus-squareAaaaaaaaaeeeeeBlinkfedilinkEnglisharrow-up1·1 year ago(With a massive batch size*) Its would be better if they provide single batch information for normal inference on fp8. People look at this and think its astonishing, but will compare this with single batch performances as that’s all they have seen before.
(With a massive batch size*)
Its would be better if they provide single batch information for normal inference on fp8.
People look at this and think its astonishing, but will compare this with single batch performances as that’s all they have seen before.