We also have LlaVa and BakLlaVA, two multimodal models based on llama and the latter on mistral.
- 1 Post
- 18 Comments
Evening_Ad6637Bto LocalLLaMA@poweruser.forum•Step by step guide for local with voice?English1·2 years agoI’ve tested „amica“ yesterday. I was very impressed. Try it out:
Evening_Ad6637Bto LocalLLaMA@poweruser.forum•🐺🐦⬛ **Big** LLM Comparison/Test: 3x 120B, 12x 70B, 2x 34B, GPT-4/3.5English1·2 years agoO.M.G. What an incredibly huge work! Wtf?! I am speechless.
You are the most angel like wolf i know so far and you really really deserve a price dude!
Again: WTH?!
Evening_Ad6637Bto LocalLLaMA@poweruser.forum•Starling-RM-7B-alpha: New RLAIF Finetuned 7b Model beats Openchat 3.5 and comes close to GPT-4English1·2 years agoYeah I dont think authors are intentionally bullshitting or intentionally doing “benchmark cosmetics”, but maybe it’s more lack of knowledge on whats going on in terms of (most of) benchmarks and their the image that has become ruined in the meantime.
Evening_Ad6637Bto LocalLLaMA@poweruser.forum•Starling-RM-7B-alpha: New RLAIF Finetuned 7b Model beats Openchat 3.5 and comes close to GPT-4English1·2 years agoheheh i can’t read that any more… i really have become very prejudiced when comes to that… to be honest, when it comes to any comparison with GPT-4.
People have really to understand that even GPT-4 has been aligned, lobotomized and it has been massively downgraded in terms of its perfomance – due to security reasons (what is understandable for me), but anyway this thing still is an absolute beast. if we consider all the restrictions GPT-4 has to undergo, all the smartness at openAI, all the ressources at microsoft and so on, we have to realize that currently nothing is really comparable to GPT-4. Especially not 7B models.
Evening_Ad6637OPBto LocalLLaMA@poweruser.forum•I have given llama.cpp server ui a faceliftEnglish1·2 years agoYes it means predict n tokens. Is it not easy to understand? I might change it back… For me it is important that an ui is also not overbloated with “words” and unfortunately “Predict_n Tokens”… how can I say… it ‘looks’ aweful. So I am looking for something more aesthetic but also easy to understand. It’s difficult for me to find.
Evening_Ad6637OPBto LocalLLaMA@poweruser.forum•I have given llama.cpp server ui a faceliftEnglish1·2 years agoThat’s a pretty good idea! thanks for your input. I will definitely make a note of it as an issue in my repo and see what I can do.
Thank you for saying that. It makes me feel valued for my work. I’ve already made a pull request and Gerganov seems to like the work in general, so he would accept a merge. I still need to fix a few things here and there though - the requirements at the llama.cpp dudes are very high : D (but i don’t expect anything else there heheh)
Evening_Ad6637OPBto LocalLLaMA@poweruser.forum•I have given llama.cpp server ui a faceliftEnglish1·2 years agodid you cloned it from my repo?
Evening_Ad6637OPBto LocalLLaMA@poweruser.forum•I have given llama.cpp server ui a faceliftEnglish1·2 years agou/ambient_temp_xeno ah I have now seen that min-p has been implemented in the server anyway, so I have now added it too.
Evening_Ad6637OPBto LocalLLaMA@poweruser.forum•I have given llama.cpp server ui a faceliftEnglish1·2 years agoYes the openai playground was my styling inspiration. I thought this is good since a lot of users will used to it.
the llama.cpp dev (gerganov) already answered and accepts a merge : ))
Evening_Ad6637OPBto LocalLLaMA@poweruser.forum•I have given llama.cpp server ui a faceliftEnglish1·2 years agoAh one sidenote: selecting a model via dialog is absolutely not intuitive. If you want to navigate into a folder, you have to press space two times. Do not press enter until you decide to choose a specific folder. It doesnt matter that much if you are in parent folders, since the script will search recursively - but of course if you have many files it could take a long time.
Evening_Ad6637Bto LocalLLaMA@poweruser.forum•How to minimize model inference costs?English1·2 years agoHmm, would it really be more expensive? vast.ai can be extremely cheap.
But I would also be interested in the topic as a whole. I think you would first have to calculate this very precisely. e.g. what is the scope of an average user request? Not many users will fill the entire context window with every request. If we had or could estimate an average value, we would derive how many tokens/second an economically efficient GPU works with and extrapolate that to the price of 1 million tokens.
But there are other important factors as well:
-
Which country is the hardware located in? Electricity prices can also be extremely different from country to country.
-
How much did the operator have to pay for all his hardware? As a bulk buyer, you almost always get better prices, regardless of the sector.
-
Does he perhaps also operate his own photovoltaic systems and, if so, to what extent?
-
It is also important to remember that not every product leads directly to financial profits. If you have enough capital and can afford it over a certain period of time, you may deliberately consider doing a loss-making business in order to eliminate the competition. This way one could hope to gain reach and popularity with customers, who would then buy other products in the future. (See OpenAI and ChatGPT).
-
I use various things, regularly testing if one of them has become better etc.
-
Mainly llama.cpp backend and server as UI - it has everything what I need, it’s lightweight, it’s hackable
-
Ollama - Simplifies many steps, has very convenient functions and an overall coherent and powerful ecosystem. Mostly in terminal, but sometimes in a modified Ollama Webui
-
Sometimes Agnai and/or RisuAI - nice and powerful UIs with satisfying UXs, however not as powerful as sillytavern. But sillytavern is too much if you are not a RP power-user.
-
My own custom Obsidian ChatGPT-MD + Canvas Chat Addon Addons with local endpoints.
In general I try to avoid everything that comes with python code and I prefer solutions with as minimal dependencies as possible, so it’s easier to hack and customize to my needs.
-
Evening_Ad6637Bto LocalLLaMA@poweruser.forum•Where and how to run Goliath 120b GGUF with good performance?English1·2 years agoThis is the only helpful because right answer.
Evening_Ad6637Bto LocalLLaMA@poweruser.forum•Just a heads up, if you give your AI access to your game it might try to kill youEnglish1·2 years agoThat was so funny xD but above all this is so fascinating. This shows us where the whole entertainment industry is going to be in the next few years and it’s amazing man. Infinite open worlds, „real“ not scripted dialogues, unexpected events and challenges, virtual relationships and much more _
Evening_Ad6637Bto LocalLLaMA@poweruser.forum•64GB RAM vs 3060 12GB vs Intel a770?English1·2 years agoI highly recommend 3060 12GB
Evening_Ad6637Bto LocalLLaMA@poweruser.forum•Zephyr comes up with an original, impressive joke.English1·2 years agoThreads 20 is too high. Set t=8 or t=10 but not more.
Gpu layers should be 35 I think, I mean all layers should fit in gpu.
This way you achieve 10x inference speedup compared to your current setup
thanks for your feedback. that’s strange, I couldn’t reproduce this bug (or I didn’t understand the error?)
I’ll answer you on github more detailed.