Mostly I’m still using slightly older models, with a few slightly newer ones now:
-
marx-3b-v3.Q4_K_M.gguf for “fast” RAG inference,
-
medalpaca-13B.ggmlv3.q4_1.bin for medical research,
-
mistral-7b-openorca.Q4_K_M.gguf for creative writing,
-
NousResearch-Nous-Capybara-3B-V1.9-Q4_K_M.gguf for creative writing, and probably for giving my IRC bots conversational capabilities (a work in progress),
-
puddlejumper-13b-v2.Q4_K_M.gguf for physics research, questions about society and philosophy, “slow” RAG inference, and translating between English and German,
-
refact-1_6b-Q4_K_M.gguf as a coding copilot, for fill-in-the-middle,
-
rift-coder-v0-7b-gguf.git as a coding copilot when I’m writing python or trying to figure out my coworkers’ python,
-
scarlett-33b.ggmlv3.q4_1.bin for creative writing, though less than I used to.
I also have several models which I’ve downloaded but not yet had time to evaluate, and am downloading more as we speak (though even more slowly than usual; a couple of weeks ago my download rates from HF dropped roughly in third, and I don’t know why).
Some which seem particularly promising:
-
yi-34b-200k-llamafied.Q4_K_M.gguf
-
rocket-3b.Q4_K_M.gguf
-
llmware’s “bling” and “dragon” models. I’m downloading them all, though so far there are only GGUFs available for three of them. I’m particularly intrigued at the prospect of llmware-dragon-falcon-7b-v0-gguf which is tuned specifically for RAG and is supposedly “hallucination-proofed”, and llmware-bling-stable-lm-3b-4e1t-v0-gguf which might be a better IRC-bot conversational model.
Of all of these, the one I use most frequently is PuddleJumper-13B-v2.
Docker and Kubernetes are popular mostly because the industry has broadly given up on release engineering. This means applications/services can have different and conflicting dependencies, so the only way they can run on the same physical host is by putting each in their own containers or VM instances, each with their specific dependencies.
The alternative is to have a platform with standard libraries, and to port applications to the platform, using the platform’s libraries as their dependencies, and thus avoid conflict. This requires effort and discipline, so of course it is not very popular, though it was the standard practice twenty years ago.
As far as I know the only Linux distribution which still follows the platform approach is Slackware. Applications which are ported to Slackware are guaranteed to work well together without conflicts, but not a lot of applications have been thus ported (Slackware only has about two thousand official packages, in all).