LocalLLaMA@poweruser.forumEnglish · 3 years ago

ShareGPT4V - New multi-modal model, improves on LLaVA

sharegpt4v.github.io

1

ShareGPT4V - New multi-modal model, improves on LLaVA

sharegpt4v.github.io

LocalLLaMA@poweruser.forumEnglish · 3 years ago

ShareGPT4V

sharegpt4v.github.io

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

Chat

GeraltOfRigaB
link
fedilink
English
arrow-up
1·
3 years ago
This is kinda nuts (first time I try a LLM + vision)

Tried with a first person shooter screenshot, enemy on screen. Asked to give me the 2D coordinates of the enemy and it did, precisely.