- So, i’ve been doing all my LLM-tinkering on an M1-- using llama.cpp/whisper.cpp for to run a basic voice powered assistant, nothing new at this point.
- Currently adding a visual component to it-- ShareGPT4V-7B, assuming I manage to convert to
gguf
. Once thats done i should be able to integrate it with llama.cpp and wire it to a live camera feed-- giving it eyes. - Might even get crazy and throw in a low level component to handle basic object detection, letting the model know when something is being “shown” to the to it-- other than that it will activate when prompted to do so (text or voice).
The one thing I’m not sure about is how to run a TTS engine locally like StyleTTS2-LJSpeech? are there libraries that support tts models?
It’s tortoise so who knows. There is mac pytorch now. You would have to figure it out from scratch. I’m not sure why nobody is trying it.
When I tried edge-tts it was very mediocre like silero.