llama.cpp server rocks now! 🤘

Gorefindal · 1 year ago

llama.cpp server rocks now! 🤘

aseichter2007 · 1 year ago

I’m pretty sure that makes it compatible with Clipboard Conqueror too!

SatoshiNotMe · 1 year ago

You mean we don’t need to use llama-cpp-Python anymore to serve this at an OAI-like endpoint?

reallmconnoisseur · 1 year ago

Correct. You run llama.cpp server and inside your code/gui whatever you set OpenAI base API to the server’s endpoint.

sleeper-2 · 1 year ago

huge fan of server.cpp too! I actually embed a universal binary (created with lipo) in my macOS app (FreeChat) and use it as an LLM backend running on localhost. Seeing how quickly it improves makes me very happy about this architecture choice.

I just saw the improvements issue today. Pretty excited about the possibility of getting chat template functionality since currently all of that complexity has to live in my client.

Also, TIL about the batching stuff. I’m going to try getting multiple responses using that.

Gorefindal · 1 year ago

*Love* FreeChat!

herozorro · 1 year ago

will this speed up ollama project?