on-demand inference or batch inference?

Ok_Post_149 · 1 year ago

on-demand inference or batch inference?

AdamDhahabi · 1 year ago

I think batched inference is a must for companies who want to put an on-premise chatbot in front of their users. This is a use case many are busy with at the moment. I saw llama.cpp now supports batched inference, only since 2 weeks, I don’t have hands-on experience with it yet.

Ok_Post_149 · 1 year ago

Thanks for this feedback, what is your definition of an on-prem chatbot? Hosted on their physical infrastructure?