https://reddit.com/link/17rzqfm/video/fqtexzq5fhzb1/player
Heard Apple’s working on an on-device Siri with LLMs, but these models are memory-intensive, especially for iPhone’s limited RAM. This isn’t just an Apple issue; big tech companies who want to run ML models on device, like samsung, google, meta will face same problem.
What if models could run directly from storage instead of RAM?
Samsung is onto something with their MRAM tech – it’s non-volatile, power-efficient, and can handle some Logic, AI processing. Imagine your phone running models from storage!
Not an ML expert, but this tech evolution is intriguing. is there other attempt like this?
What if models could run directly from storage instead of RAM?
You can already do that. That’s what mmap does. It uses a file on storage as if it were RAM. It’s not speedy. Since even the fastest SSD is slow compared to RAM.
Thanks didn’t know that
RAM is storage, just faster to access and write to.
Sure, it’s just going to generate 5 tokens per week
It will never be this bad, at most, it would be 2min / t