I keep diving and finding GPT-4V prototypes shared on X: e.g. narration for videos (source), posture correction (source), etc.

As foundation models in computer vision become even more accessible, will the field recover some attention (wrt to LLMs hype)?