LocalLLaMA@poweruser.forumEnglish · 2 years ago

NVidia H200 achieves nearly 12,000 tokens/sec on Llama2-13B with TensorRT-LLM

1

NVidia H200 achieves nearly 12,000 tokens/sec on Llama2-13B with TensorRT-LLM

LocalLLaMA@poweruser.forumEnglish · 2 years ago

Chat

Longjumping-Bake-557B
link
fedilink
English
arrow-up
1·
2 years ago
And that’s on a die just slightly bigger than the 4090. Unless they increased the size compared to h100?