Running multiple GPUs requires PCIe lanes. Consumer PCs have too few of those to even run 2x GPUs at full bandwidth (2x16).

Threadrippers are prohibitively expensive for many.

AMD have announced EPYC 8004 Siena in September. These low-power server CPUs start at 8 cores @ ~$400 and offer 96 lanes. The catch is that the clock is pretty low.

So, the question is: How bottlenecked are LLMs by CPU clock?

I.e., would it make much of a difference if you run 4x 3090s on the $2000+ Threadripper vs $400 Epyc 8004?

  • AutomataManifoldB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    If you have a really old CPU, it will be a bottleneck, because there’s some CPU involvement at inference time. I had a 3090 on an old server CPU with lots of cores but a slow clock speed and it got about half the expected speed. (Newer inference engines like Exllama might have addressed this, but I haven’t tested.) But, I should stress, that’s a CPU from 8 years ago.

    I don’t have benchmarks for current gen CPUs; I imagine that they’re similar to each other. I’d be more worried about physical space for the cards, power draw, PCI lanes, etc.