• 8 Posts
  • 62 Comments
Joined 11 months ago
cake
Cake day: October 30th, 2023

help-circle

  • Yes, that M1 Max should running LLMs really well including 70B with decent context. A M2 won’t be much better. A M3, other than the 400GB/s model, won’t be as good. Since everything but the 400GB/s has had the memory bandwidth cut from the M1/M2 models.

    Are you seeing that $2400 at B&H? It was $200 cheaper there a couple of weeks ago. It might be worth it to see if the price goes back down.














  • Yes. I’ve done that before on my other machines. Llama.cpp in fact defaults to that. The hope for me was that since the models are sparse that the OS would cache the relevant parts of the models in RAM. So the first run through would be slow but subsequent runs would be fast since those pages are cached in RAM. How well that works or not really depends on how much RAM the OS is willing to use to cache mmap and how smartly it does it. My hope was that if it did it smarty with sparse data that it would be pretty fast. So far, my hopes haven’t been realized.







  • I just don’t login using the GUI. There indeed doesn’t seem to be a way to turn it off like in Linux. So it still uses up 10s of MB waiting for you to login. But that’s a far cry from the 100’s of MB if you do login. I have thought about killing those login in processes but the Mac is so GUI centric that if something really goes wrong and I can’t ssh in, I want that as backup. I think a few 10’s of MB is worth it for that instead of trying to fix things in the terminal in recovery mode.