Hi all, I bought a new pc last year and after experimenting with llms for the last months I have some doubts. I can run 7b, 13b and even 20/30b model reasonably fast but the 70b (I use the q3 quantization, GGUF format) run at 1t/s using windows 11. I´m thinking about how to upgrade my pc so I can get at least 2/3 t/s with a q4 70b. My specs are:

-MSI PRO B760-P WIFI DDR4

-Intel 13700 cpu (the NOT k model, and it´s a little undervolted)

-Nvidia 4080 16Gb gpu

-2x16gb 3200mHZ CL16 RAM

-2 NVMe SSDs

-1 old HDD from my old computer

-Seasonic 850W gold psu

The option I though were:

a) Substitute the old hdd for a bigger sata ssd, make a partition and install a linux distro that I would use in dual boot only for llms.

b) Adding a 3060 12gb or a 4060ti 16gb as a second gpu. I would only use the second gpu for the llms.

c) Both?

So, what are the pros and cons? Other options? Can my psu support a second GPU? Is there a difference in performance when running the models in a NVMe SSD compared to a sata SSD? There would be compatibility problems using the 4080 and a 3060 as those gpus are from different generations? How much performance improvement can I expect?

Thanks a lot for the help!