Newbie question, but is there a way to have 4*A100 40G cards run as one, with 160G VRAM in total?
I am not able to load a 70B model even with 4bit quantization because my lab has 40G cards.
edit) If this is possible, can I run 8*3090 24G cards as one also?