Happy to announce my release of Nous-Capybara 7B and 3B V1.9!
7B V1.9 version is now trained with Mistral instead of V1 that was trained on Llama. Also some significant dataset improvements under the hood.
As for the 3B size, it’s the first sub-7B model released under Nous Research and leverages the same dataset as 7B V1.9, efficient enough to run briskly on even a non-pro iphone! This is what’s currently being used as well for the foundation of the worlds first 3B parameter multi-modal model called Obsidian (Should be released by the time of this posting.)
Capybara uses a new method called Amplify-Instruct for data creation, this uses existing single-turn popular datasets like Airoboros, EverythingLM and Know_logic as the seeds for which synthetic long context back and forth conversational examples are synthesized from.(Paper releasing soon with more details)
Amongst the dataset process is also thousands of top posts scraped regarding certain subjects on the website LessWrong that discuss deep complex long form concepts surrounding the nature of reality, reasoning, futurism and philosophy, and then using the Amplify-instruct technique on this data to leverage this into advanced long context multi-turn examples. It is also trained on tasks of summarizing these multiple thousand token long posts, papers and articles regarding such topics,and then having back and forth conversations discussing things surrounding variations of such summaries.
Part of the development of the dataset was with the goal of an unbiased, natural casual prose and great conversational abilities, while having very logical analytical prowess and robustness in back and forth conversation. V1.9 further improves this by putting further emphasis on improving realistic prose, identifying and removing dataset examples that were shown to hurt certain reasoning capabilities, and identifying biases that hurt problem solving abilities as well.
There was also found to be instances of the model being biased towards a more robotic identity through the training data and even certain physical identity biases regarding self-identity, like pre-conceived notions a model could have about being physical versus metaphysical, pre-conceived notions relating to what knowledge was held within the self of Capybara etc… Identifying and fixing these biases within the distribution for V1.9 seemed to give significant improvements overall in terms of how well the model works with little to no instructions and no system prompt, but also seems to significantly improve the steerability of the model and how well it can now follow more complex and difficult system prompts.
Although I didn’t intend to optimize this model for Roleplay specifically, I was very surprised to see people messaging me about how Capybara V1 was one of their favorite models for RolePlay, and based on some early testers it seems that Capybara V1.9 is a further significant jump in not just the logical analytical capabilities, but also the coherency and casual steerable prose for roleplay, several telling me it’s now their new favorite model for such use cases.
I’m excited that I finally have this released and I hope I can get feedback from any of you as well that might be interested in trying it out! Here is the quantized version by TheBloke of 7B V1.9: https://huggingface.co/TheBloke/Nous-Capybara-7B-v1.9-GGUF
And here is the quantized version of 3B: https://huggingface.co/TheBloke/Nous-Capybara-3B-v1.9-GPTQ