How bottlenecked are LLMs by CPU clock? (Budget options to host multiple GPUs)

Infinite100p · 1 year ago

How bottlenecked are LLMs by CPU clock? (Budget options to host multiple GPUs)

Worldly-Mistake-8147 · 1 year ago

Holy… 4x3090! No wonder it was hard to find my third one for reasonable price.

AutomataManifold · 1 year ago

If you have a really old CPU, it will be a bottleneck, because there’s some CPU involvement at inference time. I had a 3090 on an old server CPU with lots of cores but a slow clock speed and it got about half the expected speed. (Newer inference engines like Exllama might have addressed this, but I haven’t tested.) But, I should stress, that’s a CPU from 8 years ago.

I don’t have benchmarks for current gen CPUs; I imagine that they’re similar to each other. I’d be more worried about physical space for the cards, power draw, PCI lanes, etc.