I have a mac studio as my main inference machine.
My opinion? RAM and Bandwidth > all, IMO, Personally, I would pick A as it’s the perfect in-between. At 64GB of RAM you should have around 48GB or so of usable VRAM without any kernel/sudo shenanigans (Im excited to try some of the recommendations folks have given here lately to change that), and you get the 400GB/s bandwidth.
My Mac Studio has 800GB/s bandwidth, and I can run 70b q8 models… but at full context, it requires a bit of patience. I imagine a 70b would be beyond frustrating at 300GB/s bandwidth. While the 96GB model could run a 70b q8… I don’t really know that I’d want to, if I’m being honest.
My personal view is that on a laptop like that, I’d want to max out on the 34b models, as those are very powerful and would still run at a decent speed on the laptop’s bandwidth. So if all I was planning to run was 34b models, a 34b q8 with 16k context would fit cleanly into 48GB and I’d earn an extra 100GB/s of bandwidth for the choice.
Wow, I’ve never seen an fp16 gguf before. Holy crap, I wish there were more of those out there; I’d love to get my hands on some for 70b models or the like. I didn’t realize unquantized gguf was an option