struggling to include text prompts along with image-data (multimodal) for inferencing

LyPreto · 2 years ago

struggling to include text prompts along with image-data (multimodal) for inferencing

paryska99 · 2 years ago

Doesn’t the LlamaCpp server host a GUI for multimodal? You could potentially visit it, open the developer panel in your browser, and observe the HTTP requests being sent.

LyPreto · 2 years ago

I ended up just scrutinizing the server code to understand it better and found that the prompt needs to follow a very specific format or else it won’t work well:

prompt: \A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human’s questions.\nUSER:[img-12]${message}\nASSISTANT:``