hi folks,

simple question really - what model (finetuned or otherwise) have you found that can extract data from a bunch of text.

I’m happy to finetune, so if there are any successes there, would really appreciate some pointers in the right direction.

Really looking for a starting point here. I’m aware of the DETR class of models and how Microsoft trained table-transformers on DETR. Wondering if that can be done on llama2,etc models ?

P.S. cannot use GPT because of sensitive PII data.

  • georgejrjrjrB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    I’ve wondered this, and hope you get better answers.

    One thing you could do if it fit your use-case: align GDELT entries and news stories in realnews dataset on huggingface, train a model to output the extracted info from the article.

    Another is have GPT-4 so some examples on lightly faked / anonymized data and then distill that into a model that does well on information extraction evals (which are a thing iirc).

    • sandys1OPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      What is the information extraction evals ? Do u have a link ?

  • fediverser
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    This post is an automated archive from a submission made on /r/LocalLLaMA, powered by Fediverser software running on alien.top. Responses to this submission will not be seen by the original author until they claim ownership of their alien.top account. Please consider reaching out to them let them know about this post and help them migrate to Lemmy.

    Lemmy users: you are still very much encouraged to participate in the discussion. There are still many other subscribers on !localllama@poweruser.forum that can benefit from your contribution and join in the conversation.

    Reddit users: you can also join the fediverse right away by getting by visiting https://portal.alien.top. If you are looking for a Reddit alternative made for and by an independent community, check out Fediverser.