In some senses you end up with convergent design, it’s not a GPU, it’s just a control system that commands a bunch of accelerator units with a high-bandwidth memory subsystem. But that could be ARM and an accelerator unit etc. Probably need fast networking.
But it’s overall a crazy proposition to me. Like first off goog and amazon are gonna beat you to market on anything that looks good, and you have no real moat other than “I’m sam altman”, and really there’s no market penetration of the thing (or support in execution let alone actual research) etc. Training is a really hard problem to solve because right now it’s absolutely firmly rooted in the CUDA ecosystem. Supposedly there may be a GPU Ocelot thing once again at some point but like, everyone just works with nvidia because they’re the gpgpu ecosystem that matters.
Like, if you wanted to do this you did like Tesla and have Jim Keller design you a big fancy architecture for training fast at scale (Dojo). I guess they walked away from it or something and just didn’t care anymore? Oops.
But, that’s the problem, it’s expensive to stay at the cutting edge. It’s expensive to get the first chip, and you’ll be going against competitors who have the scale to make their own in-house anyway. it’s a crazy business decision to be throwing yourself on the silicon treadmill against intense competition just to give nvidia the finger. wack, hemad.
In some senses you end up with convergent design, it’s not a GPU, it’s just a control system that commands a bunch of accelerator units with a high-bandwidth memory subsystem. But that could be ARM and an accelerator unit etc. Probably need fast networking.
But it’s overall a crazy proposition to me. Like first off goog and amazon are gonna beat you to market on anything that looks good, and you have no real moat other than “I’m sam altman”, and really there’s no market penetration of the thing (or support in execution let alone actual research) etc. Training is a really hard problem to solve because right now it’s absolutely firmly rooted in the CUDA ecosystem. Supposedly there may be a GPU Ocelot thing once again at some point but like, everyone just works with nvidia because they’re the gpgpu ecosystem that matters.
Like, if you wanted to do this you did like Tesla and have Jim Keller design you a big fancy architecture for training fast at scale (Dojo). I guess they walked away from it or something and just didn’t care anymore? Oops.
But, that’s the problem, it’s expensive to stay at the cutting edge. It’s expensive to get the first chip, and you’ll be going against competitors who have the scale to make their own in-house anyway. it’s a crazy business decision to be throwing yourself on the silicon treadmill against intense competition just to give nvidia the finger. wack, hemad.