NVidia H200 achieves nearly 12,000 tokens/sec on Llama2-13B with TensorRT-LLM

rihard7854 · 2 years ago

NVidia H200 achieves nearly 12,000 tokens/sec on Llama2-13B with TensorRT-LLM

Herr_Drosselmeyer · 2 years ago

Obviously. There aren’t many people in the world with 50k burning a hole in their pockets and of those, even fewer are nerdy enough to want to set up their own AI server in their basement just for themselves to tinker with.