Is there a technical reason that distributed LLMs don't exist?

chinawcswing · 1 year ago

Is there a technical reason that distributed LLMs don't exist?

Monkey_1505 · 1 year ago

The latencies involved make it tricky. You can’t just split it across them due to latency, which means both computers need to do their compute independently and then get combined somehow, which means you need to be able to break up inference into two completely distinct tasks.

I’m not sure if this is possible, but if it is, it hasn’t been invented yet.

bigattichouse · 1 year ago

I mean, they get distributed over multiple GPU cores… what’s it matter if they’re local or not?

farkinga · 1 year ago

Nice post. This got me thinking…

While many commenters are discussing the computation aspect, which leads to petals and the horde, I am thinking about bit torrent (since you mentioned it).

We do need a hub for torrenting LLMs. HF is amazing for their bandwidth (okay for the UI) - but once that VC money dries up, we’ll be on our own. So, distributing the models - just the data, not the computation - is also important.

mcmoose1900 · 1 year ago

Hopefully the community will transition to LoRAs instead of passing barely changed model weights around.