Would it be possible to train a modern LLM on hardware from 1985?

stinkypeteryerg · 2 years ago

Would it be possible to train a modern LLM on hardware from 1985?

DannyBoy@sh.itjust.works · 2 years ago

The fastest computer in 1985 was the CRAY-2 supercomputer at 1.9 gigaflops. ChatGPT 3 can be trained on 1024 A100 GPUs in 34 days*. An A100 outputs 312 teraflops. So no, I don’t think it can be done in 1985 if given the entire year. There’s also storage for incoming digital texts for training - the input data didn’t exist back then, not to the capacity. I don’t think it could be done in a reasonable time.

https://www.assemblyai.com/blog/how-to-train-large-deep-learning-models-as-a-startup/

ResponsibleJudge3172 · 2 years ago

No, you are limited by:

Compute Performance, you will need 10,000%+ more compute than was available per chip, and those PCIe accelerators don’t have the ability to compute the way they do now. You are going to have to rely on CPUs which is worse

Lack of scalabality of interconnecting chips to behave as one, increasing IO requirements dramatically.

Lack of memory pooling (yes you qualified it), memory bandwidth and memory sizes (we are talking in megabytes), imagine waiting for 1 billion parameter model calculations to load and store in each layer of a neural network using floppy disks.