is there any ongoing effort to "bake-in" vision capabilities on top of base models or fine-tunes?

LyPretoB to

LocalLLaMA@poweruser.forumEnglish · 3 years ago

have been thinking about this for a while-- does anyone know how feasible this is? Basically just applying some sort of “LoRa” on top of models to give them vision capabilities-- making then multimodal.

Chat