Hi all, I bought a new pc last year and after experimenting with llms for the last months I have some doubts. I can run 7b, 13b and even 20/30b model reasonably fast but the 70b (I use the q3 quantization, GGUF format) run at 1t/s using windows 11. I´m thinking about how to upgrade my pc so I can get at least 2/3 t/s with a q4 70b. My specs are:
-MSI PRO B760-P WIFI DDR4
-Intel 13700 cpu (the NOT k model, and it´s a little undervolted)
-Nvidia 4080 16Gb gpu
-2x16gb 3200mHZ CL16 RAM
-2 NVMe SSDs
-1 old HDD from my old computer
-Seasonic 850W gold psu
The option I though were:
a) Substitute the old hdd for a bigger sata ssd, make a partition and install a linux distro that I would use in dual boot only for llms.
b) Adding a 3060 12gb or a 4060ti 16gb as a second gpu. I would only use the second gpu for the llms.
c) Both?
So, what are the pros and cons? Other options? Can my psu support a second GPU? Is there a difference in performance when running the models in a NVMe SSD compared to a sata SSD? There would be compatibility problems using the 4080 and a 3060 as those gpus are from different generations? How much performance improvement can I expect?
Thanks a lot for the help!