• metalman123B
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    This style of captioning could be amazing for text to image datasets and i wouldn’t be surprised to see them take a jump in quality as well.

  • GeraltOfRigaB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    This is kinda nuts (first time I try a LLM + vision)

    Tried with a first person shooter screenshot, enemy on screen. Asked to give me the 2D coordinates of the enemy and it did, precisely.