is there any ongoing effort to "bake-in" vision capabilities on top of base models or fine-tunes?

LyPreto · 1 year ago

is there any ongoing effort to "bake-in" vision capabilities on top of base models or fine-tunes?

mcmoose1900 · 1 year ago

There’s more than one image ingestion model already. Several for llama/mistral.

If you are talking about generating images, I dunno about that. Some people hook up LLMs to prompt stable diffusion, but thats not really the same thing.