LocalLLaMA@poweruser.forumEnglish · 1 year ago

NVidia H200 achieves nearly 12,000 tokens/sec on Llama2-13B with TensorRT-LLM

1

NVidia H200 achieves nearly 12,000 tokens/sec on Llama2-13B with TensorRT-LLM

LocalLLaMA@poweruser.forumEnglish · 1 year ago

Chat

Longjumping-Bake-557B
link
fedilink
English
arrow-up
1·
1 year ago
And that’s on a die just slightly bigger than the 4090. Unless they increased the size compared to h100?