Hey All,

what does making a model prediction look like for your current projects? Are you building a model for a web-app and you’re running on-demand inference? Are you working on a research project or doing some analysis that requires making hundreds of thousands to millions of predictions all at once?

I’m currently at a crossroads with a developer tool I’m building and trying to figure out which types of inference workflows I should be focused on. A few weeks back I posted a tutorial on running Mistral-7B on hundreds of GPUs in the cloud in parallel. I got a decent amount of people saying that batch inference is relevant to them but over the last couple of days I’ve been running into more and more developers that are building web-apps that don’t need to make many predictions all at once. If you were me where would you direct your focus?

Anyways, I’m kinda rambling but I would love to know what you guys are working on and get some advice on the direction I should pursue.

  • Ok_Post_149OPB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    Thanks for this feedback, what is your definition of an on-prem chatbot? Hosted on their physical infrastructure?