LocalLLaMA@poweruser.forumEnglish · 3 years ago

I wonder theres way to run LLM without loading on ram

9

1

I wonder theres way to run LLM without loading on ram

LocalLLaMA@poweruser.forumEnglish · 3 years ago

9

https://preview.redd.it/txoqaubzehzb1.png?width=1062&format=png&auto=webp&s=5ce1e0599c1b0430106cd828cad77dc516a42a4a

https://reddit.com/link/17rzqfm/video/fqtexzq5fhzb1/player

https://preview.redd.it/s60h7gh1fhzb1.png?width=1016&format=png&auto=webp&s=23f963f561d4f57c8562924032301ce0256e4249

Heard Apple’s working on an on-device Siri with LLMs, but these models are memory-intensive, especially for iPhone’s limited RAM. This isn’t just an Apple issue; big tech companies who want to run ML models on device, like samsung, google, meta will face same problem.

What if models could run directly from storage instead of RAM?

Samsung is onto something with their MRAM tech – it’s non-volatile, power-efficient, and can handle some Logic, AI processing. Imagine your phone running models from storage!

Not an ML expert, but this tech evolution is intriguing. is there other attempt like this?

Chat

xadiantB
link
fedilink
arrow-up
1·
3 years ago
Sure, it’s just going to generate 5 tokens per week
- AaaaaaaaaeeeeeB
  link
  fedilink
  English
  arrow-up
  1·
  3 years ago
  It will never be this bad, at most, it would be 2min / t