• BalorNGB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    Given how good 7b Mistral is in my personal experience, it seems that a model 3x its size can BE GPT3.5 Turbo is no longer implausible.

  • xadiantB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    No fucking way. GPT-3 has 175B params. In no shape or form they could have discovered the “secret sauce” to make an ultra smart 20B model. TruthfulQA paper suggests that bigger models are more likely to score worse, and ChatGPT’s TQA score is impressively bad. I think the papers responsible for impressive open-source models are max 12-20 months old. Turbo version is probably quantized, that’s all.

  • FPhamB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    It looks weird going from 75B text-davinci-003 to 20B gpt-3.5-turno. But a) we don’t know how they count this - a quantization effectively halves the number of parameters and b) we don’t know anything how they made it.

    except c) they threw much more money at it, using humans to clean the dataset. A clean dataset can make 20B sing. We are using META chaos in llama2 70b with everything thrown at it…