I wonder theres way to run LLM without loading on ram

wjohhan · 2 years ago

I wonder theres way to run LLM without loading on ram

fallingdowndizzyvr · 2 years ago

What if models could run directly from storage instead of RAM?

You can already do that. That’s what mmap does. It uses a file on storage as if it were RAM. It’s not speedy. Since even the fastest SSD is slow compared to RAM.

wjohhan · 2 years ago

Thanks didn’t know that

Herr_Drosselmeyer · 2 years ago

RAM is storage, just faster to access and write to.

xadiant · 2 years ago

Sure, it’s just going to generate 5 tokens per week

Aaaaaaaaaeeeee · 2 years ago

It will never be this bad, at most, it would be 2min / t