What are the benefits of using an H100 over an A100 (both at 80 GB and both using FP16) for LLM inference?

Seeing the datasheet for both GPUS, the H100 has twice the max flops, but they have almost the same memory bandwidth (2000 GB/sec). As memory latency dominates inference, I wonder what benefits the H100 has. One benefit could, of course, be the ability to use FP8 (which is extremely useful), but I’m interested in the difference in the hardware specs in this question.

  • I_will_delete_myselfB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    A100 is like a 3070ti with 80gb Vram. H100 is like a 4090 with 80gb of ram and optimized hardware for transformers.