What I currently see is that many companies try to create a “GPT”, a model which is basically competing with the GPT models of OpenAI or claude. The problem is, in my opinion, that these open source projects with just a few people working on it have very limited resource power. Even if they have 10 A100s with 80 GB of VRAM, you will never come close to the computing power and to the manpower you need in order to actually get such a model. If you go above 13 billion parameters, you already have the problem that over 99% of all humanity cannot use your model.
While, yes, you can run it on Colab, you have then the problem that you have people indebted to Colab, so to speak. If Colab pulls the plug, then it doesn’t work. If it’s hosted by another company and the company pulls the plug, it doesn’t work anymore. So, in my opinion, people should focus on creating models that are focused on something. Basic example, Japanese to English translation. Or maybe a model which is really good with historic facts. Because every single thing is an additional parameter, which makes it harder and harder to actually load the entire model. If this goes on, in my opinion, we will not see any development that is actually really beneficial. And this is not me being a doomer and saying “oh, no, it will never work” but unless new technology is released, which specifically makes it possible to get basically something that is equal to 300 billion parameters or something like that working, in my opinion, it’s useless.
We need to actually do something with that which we use. I think open source projects should focus on something and then actually have 13 billion parameters of something hyper-focused on a very specific part, allowing the model to perform amazing at the subject. Let the big thing be llama 3 from meta, but I think it’s impossible to get something like gpt 3.5 and gpt-4 with open-source methods. One of the best models are currently llama and Mistral… both from companies that are either billions or 100s of millions worth now.
You can certainly try to finetune that which is released, like the new llama models, try to modify them, but I see so many models being released that basically nobody uses. or really have an use.
What do you all think about it? I just think, after testing out so many different models, that these goals that small teams set themselves to, are simply not possible, and should try to create something that is amazing at one thing.
TLDR: I think open source projects should focus on being very good at certain tasks instead of being good at “everything”.
edit: when i say opensource, I mean the small teams that are just a few people and a few a100s. Not the open-source models of mistral and meta.
The big company give you 13b base to play with and you can fine tune it to fit your specifications.
I agree that people should not focus on OpenAI GPT killer, but mostly because it is a losing proposition, so they are basically wasting time after certain point. You can finetune 13b until it is blue, it will still not be OpenAi GPT.
But then, back to the top - YOU can finetune it to whatever you want. It just happened the other people want to make it a general GPT - toddler. I don’t. And I finetune it in whatever I want.
Still, 13b is playing with a toy car, 33b is playing with a toy truck. From 70b it starts to be more interesting (a toy airplane?) but it also require a bit different setup to play with it. So out of my toy box.
To think we are scratching at the feet of company that got 10B from Microsoft and can hire the brightest minds is unrealistic. We are not even playing in the same field. We are in a sandbox, somewhere near our mama’s home, they are in a big arena sponsored by big money. With a marching band and cheerleaders and beer and everything.