I recently noticed that local LLMs are unable to sort even simple lists. They often lose entries, and what’s worse, after completing the task, they insist it was done correctly or try to correct it endlessly. Commercial models (GPT-3.5, GPT-4, Claude2) do not have this problem.

Example list:

Sort the items in ascending order:

Item A1 - 56
Item B2 - 32
Item C3 - 78
Item D4 - 14
Item E5 - 89
Item F6 - 45
Item G7 - 63
Item H8 - 27
Item I9 - 94
Item J10 - 11
Item K11 - 72
Item L12 - 38
Item M13 - 50
Item N14 - 19
Item O15 - 81

Until now, I was sure that current LLMs struggle with larger numbers and mathematics, but I thought sorting would be a relatively simple task.

Tested on: Goliath 120b, LLama2 70b, WizardCoder 15B, Mistral 7b.

What are your thoughts? Do you think we will be able to fine-tune a model to perform tasks like sorting, or implement additional capabilities by implementing a Mixture of Experts (MoE)

  • FPhamB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    At every step LLM was giving you BS. It tells you that it understands every step yet the result is wrong.

    The reason is simple: we need more parameters. We are topping at 70b. That’s fine for text, not good enough for non-text.

    Goliath is still 70b - merging two 70b models doesn’t make it 140b base. It won’t suddenly have 2 x pre-training.

    Unlike words that can be split into one or two tokens, every digit is in llama tokenizer split into a single token. So you need more parameters to find a pattern in numbers when the task is textual - for LLM a longer number is as complicated as entire sentence. It’s a miracle it can add two numbers.