• Key-Comparison3261OPB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    You have exllama, vllm, lmdeploy in python. And in most cases fastapi is used for serving an http endpoint.

    I wrote llm-sharp just for dropping python (GIL, pip deps) and getting flexible adaptation to dynamic model structures apart from standard llama.