I already tried to set up fastchat-t5 on a digitalocean virtual server with 32 GB Ram and 4 vCPUs for $160/month with CPU interference. The performance was horrible. Answers took about 5 seconds for the first token and then 1 word per second.

Any ideas how to host a small LLM like fastchat-t5 economically?

  • HeronAI_comOPB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Wow thanks, thats really an in-depth comment I will try what you say!