CPU is a ryzen 7 3700x, with 32gb of ddr4 3000mhz
I loaded the model with ExLlamav2_HF and a 2048 sequence length. It spills, a lot. 11.5gb to be exact, but I read with the right specs I could expect 2-7tokens/s which would be more than bearable.
Is there any way I could optimize it further?
What model exl2 BPW is used?