Hardware question: combining a 3090 and a p40

Noxusequal · 1 year ago

Hardware question: combining a 3090 and a p40

Tiny_Arugula_5648 · 1 year ago

no absolutely not… not how you described it. the issue isn’t about RAM it’s about the numbers of calculations that need to be done. With GPUs you need to load the data into VRAM and that is only going to be available for that GPUs calculations it’s not a shared memory pool. So load data into the p40 it will only be able to use that for it’s calculations.

Yes you can run the model on multiple GPUs. If one of those is very slow with lots of RAM then the layers you offload to that card will be processed slowly. No there is no way to speed up calculations. VRAM is only making the weights readily available so you’re not constantly loading and unloading the model weights.

Hoppss · 1 year ago

This is not true, I have split two separate LLM models partially across a 4090 and a 3080 and have had them both run inference at the same time.

This can be done in oobabooga’s repo with just a little tinkering.

Noxusequal · 1 year ago

the question is is that still faster then system memory or not ?