• AaaaaaaaaeeeeeB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Its useful for people who want to know the inference response time.

    This wouldn’t give us a 4000 ctx reply in 1/3 of a second.