Does Dual EPYC work for LLMs?

EvokerTCG · 1 year ago

Does Dual EPYC work for LLMs?

nero10578 · 1 year ago

Dual CPUs would have terrible performance. This is because the processor is reading the whole model everytime its generating tokens and if you spread half the model onto a second CPU’s memory then the cores in the first CPU would have to read that part of the model through the slow inter-CPU link. Vice versa with the second CPU’s cores. llama.cpp would have to make a system to spread the workload across multi CPUs like they do across multi GPUs for this to work.