I recently noticed that local LLMs are unable to sort even simple lists. They often lose entries, and what’s worse, after completing the task, they insist it was done correctly or try to correct it endlessly. Commercial models (GPT-3.5, GPT-4, Claude2) do not have this problem.

Example list:

Sort the items in ascending order:

Item A1 - 56
Item B2 - 32
Item C3 - 78
Item D4 - 14
Item E5 - 89
Item F6 - 45
Item G7 - 63
Item H8 - 27
Item I9 - 94
Item J10 - 11
Item K11 - 72
Item L12 - 38
Item M13 - 50
Item N14 - 19
Item O15 - 81

Until now, I was sure that current LLMs struggle with larger numbers and mathematics, but I thought sorting would be a relatively simple task.

Tested on: Goliath 120b, LLama2 70b, WizardCoder 15B, Mistral 7b.

What are your thoughts? Do you think we will be able to fine-tune a model to perform tasks like sorting, or implement additional capabilities by implementing a Mixture of Experts (MoE)