Hi, you wonderful people!
Here’s a thought that came to my mind: Since training LLMs involves a degree of randomness, is there potentially a way to create an architecture for LLMs (or other AI) that would be somewhat deterministic in its training instead?
What I mean is, could a theoretical architecture exist where everyone could train their own separate checkpoints on different datasets, which, after combining, would result in a checkpoint with combined learning from all these different smaller checkpoints?
What this would allow us to do is let thousands of people create their own checkpoints, which when combined would result in something greater than the individual parts themselves. And since the training process is what takes the longest in developing LLMs (or any AI), this approach would allow almost everyone to contribute their share of processing power towards creating something together.
If viable, this could have huge potential implications for Open Source Software.
I’m looking forward to hearing what all of you smart people have to say about it!