Hello! I am wondering, since this would be a very interesting use case and there is more than enough training material out there (pretty much every MD file could be rendered, then the image and markdown code could be used for training/finetuning)

however I have pretty much no idea about llava. Do you think this would be feasible to do?

  • herozorroB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    what you are looking for is OCR. then feed the LLM to the markdown