So I have recently gone down the rabbit hole of cancelling my ChatGPT subscription and now just use OpenHermes2.5-Mistral-7B. I’ve learned about the different benchmarks and how they compare and I understand how to read the HuggingFace LLM leaderboard and download any other model I might like to try.

What I struggle to understand is the meaning of the naming conventions. Mistral seems to clearly be better than LLAMA2 from what I have read and I understand the differences of 7B, 13B, etc etc.

Can someone explain the additional prefixes of Hermes, OpenHermes, NeuralChat, etc.

Tldr; What is the difference between Dolphin-Mistral and OpenHermes-Mistral. I’m guessing one is the dataset and the other is how it was trained?

  • __SlimeQ__B
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Mistral and Llama2 (and Llama) are foundation models, meaning they actually trained all the weights given. Almost anything worth using is a derivative of these 3 foundation models. They are really expensive to train.

    Just about everything else is a Lora fine tune on top of one of them. Fine tunes only change a small fraction of the weights, like 1%. Functionally speaking, the important part of these is the additional data they were trained on, and that training can be done on any underlying model.

    So Open hermes is a Lora tuning on top of mistral, and is some opensource offshoot of nous hermes, which is an instruction dataset for giving good smart answers (or something) in a given instruction format.