GPU-over-IP for LLM inference?

Jugg3rnautB to

LocalLLaMA@poweruser.forumEnglish · 11 months ago

Has anyone tried to combine a server with a moderately powerful GPU with a server with a lot of RAM to run inference? Especially with llama. Cpp where you can offload just some of the layers to GPU?

https://github.com/Juice-Labs/Juice-Labs/wiki

Chat