LocalLLaMA@poweruser.forumEnglish · 1 year ago

NVidia H200 achieves nearly 12,000 tokens/sec on Llama2-13B with TensorRT-LLM

1

NVidia H200 achieves nearly 12,000 tokens/sec on Llama2-13B with TensorRT-LLM

LocalLLaMA@poweruser.forumEnglish · 1 year ago

Chat

AaaaaaaaaeeeeeB
link
fedilink
English
arrow-up
1·
1 year ago
Its useful for people who want to know the inference response time.

This wouldn’t give us a 4000 ctx reply in 1/3 of a second.