rihard7854B to LocalLLaMA@poweruser.forumEnglish · 1 year agoNVidia H200 achieves nearly 12,000 tokens/sec on Llama2-13B with TensorRT-LLMgithub.comexternal-linkmessage-square24fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkNVidia H200 achieves nearly 12,000 tokens/sec on Llama2-13B with TensorRT-LLMgithub.comrihard7854B to LocalLLaMA@poweruser.forumEnglish · 1 year agomessage-square24fedilink
minus-squareAaaaaaaaaeeeeeBlinkfedilinkEnglisharrow-up1·1 year agoIts useful for people who want to know the inference response time. This wouldn’t give us a 4000 ctx reply in 1/3 of a second.
Its useful for people who want to know the inference response time.
This wouldn’t give us a 4000 ctx reply in 1/3 of a second.