My settings for "optimal" 7B Roleplay (+ some general settings tips and a discovered new hidden gem of a model)

CardAnarchist · 1 year ago

Ah round brackets vs parentheses is one of those British vs American English things haha.

That said on paper parentheses probably should be the better choice as it should be less likely to be misinterpreted by the model.

I’m giving it a try with parentheses now, thanks!

CardAnarchist · 1 year ago

I thought it odd myself. So much so that I thought SillyTavern was bugged but that wasn’t the case.

It’s pretty easy to test yourself. Just use Koboldcpp to load in say 31 layers generate some output on seed 1 then, restart Koboldcpp with 30 layers.

Example of 31 layers of a 7B vs 30 layers on the same seed.

Each seed works the same if the layers are close enough it seems like. The output starts exactly the same before branching off.

It’s worth mentioning that the person who told me the quality was “better” with more layers loaded in simply said it was as far as he recalled.

CardAnarchist · 1 year ago

Hi there, you seem like the man to ask on this somewhat related topic to the OP,

I’ve recently found out that models output different results based on the number of layers loaded into GPU. I’ve been told that more layers loaded in = better output.

How does the loss asociated with layers not in GPU compare to the loss say between quants?

CardAnarchist · 1 year ago

Not going to lie I updated these awhile back when I was newer to the whole AI thing based on a recommendation and I had forgotten I even edited them until you just mentioned.

Pretty sure I changed these because /u/WolframRavenwolf does it xD

Care to enlighten us why these are a good idea Mr wolf.

CardAnarchist · 1 year ago

Hmm.

Well there is the target length (tokens) setting in SillyTaverns advanced formatting tab.

I’ve got it set to 200 as above and then the Response (tokens) setting set to 300.

The “target” is actually the setting which I’ve got set to 200. The setting at 300 is merely a “cap” it can’t go over.

So I’d start with changing the target length (tokens) to 100 and change your Response (tokens) cap to say 150-175 to give it a bit of wiggle room.

If that doesn’t work try removing the “be verbose” part of what I wrote if you are using that or edit this part to “Write multiple brief fresh sentences, paragraphs, and phrases.”

CardAnarchist · 1 year ago

After seeing your comment I tried the OpenHermes-2.5-neural-chat-7B-v3-1-7B-GGUF model you mention.

Unfortunately setup the way I am it didn’t respond very well for me.

Honestly I don’t think the concept of that merge is too good to be frank.

OpenHermes is fantastic. If I had to state it’s flaws I’d say it’s prose is a bit dry and the dialogue seems to speak past you in a way rather than clearly responding to you. Only issues for roleplay really.

From all I’ve read neuralchat is much the same (tbh though I’ve not got neuralchat to work particularly well for me at all) so any merge created from those two models I would expect to be a bit lacking in the roleplay department.

That said if you are wanting a model for more professional purposes it might be worth further testing.

For roleplay Misted-7B is leagues better. At least in my testing in my setup.

CardAnarchist · 1 year ago

Appreciated. If anyone has any issues with anything let me know.

The worst thing in my experience is the damn templates all these models have. So many unique templates with minor tweaks and some models are so sensitive!

I’ve literally given up on some models because I clearly couldn’t figure out the right template smh.

CardAnarchist · 1 year ago

You’re most welcome. It’s the least I can do to give just a little back to the community which has been so helpful to me in advance.

CardAnarchist · 1 year ago

/u/reiniken has reminded me of one important point I didn’t touch on much.

It’s important to replicate the style you want the AI to write in, in the first message and in your own replies to help the AI keep replicating the format.

So write narration in 3rd person and add some bracketed thoughts in the introduction message of your card if you follow my guide.

That’s why in my examples I speak myself in 3rd person. You don’t have to the AI can keep to the format without doing so from my testing but I think writing your own narration in 3rd person helps the AI keep it’s narration in 3rd person too. If it see’s your narration in 1st person it could be tempted to write it’s narration in 1st person.

CardAnarchist · 1 year ago

It’s actually not my card I just got it from chub.ai. It doesn’t have anything in it which formats the output style (other than the fact models will generally mimic the first message format). Which goes to show the power of the “quality jailbreak” I detail above! That’s what really drills the formatting into the model.

That said I have made some minor modifications to improve it the card (mostly typo fixes, a small modification to the scenario to make it more flexible and added one line into the introduction to help the model learn the bracketed thoughts format).

Here is my modded card.

Here is the original on chub.ai.

CardAnarchist · 1 year ago

The guide I just wrote should be helpful to you,

https://www.reddit.com/r/LocalLLaMA/comments/185ce1l/my_settings_for_optimal_7b_roleplay_some_general/

CardAnarchist · 1 year ago

Thanks as always for the detailed tests!

I recently learned that Goliath makes spelling errors and I see you noticed it too.

I was wondering if you noticed spelling errors when you tested some other smaller frankenmerges or if you think it’s not to do with frankenmerges but a low quant issue?

Also I wrote a sort of guide / sharing of my settings for some people that asked. Of note that you may be interested in is the Misted 7B model results I posted at the bottom of that post.

It’s the best 7B model amongst the ones I tested in it’s ability to respond to my “quality jailbreak” whilst producing interesting non dry dialogue. If you get around to testing 7B’s again, I can highly recommend it!

Link to model

CardAnarchist · 1 year ago

So do you think this approach is better then Dynatemp?

Or are you planning to put forward both modifications, leaving Dynatemp out of this Kobold build to better test just the noise modification?

CardAnarchist · 1 year ago

My settings for "optimal" 7B Roleplay (+ some general settings tips and a discovered new hidden gem of a model)

CardAnarchist · 1 year ago

Sorry I didn’t reply directly here when I say this post but I wanted to write a larger guide which also covers this subject.

https://www.reddit.com/r/LocalLLaMA/comments/185ce1l/my_settings_for_optimal_7b_roleplay_some_general/

I cover how a “quality jailbreak” zero depth input can do as you requested in OP.

CardAnarchist · 1 year ago

Goliath makes spelling errors?

I’ve only used a handful of mistral 7B’s due to constraints but I’ve never seen it make any spelling errors.

Is that a side effect of merging?

CardAnarchist · 1 year ago

I use koboldccp.

You are probably right about it being a bug as at first I couldn’t get the model to work at all (it crashed koboldccp when loading up) but it was just because I had a week old version of koboldccp. I needed to download the version that came out like 4 days ago (at that time) ha! Then it loaded up fine but with that already mentioned quirk. I guess it will get fixed in short time.

Yeah the future of local LLM’s lies in the smaller models for sure!

CardAnarchist · 1 year ago

I was very impressed by rocket 3B in the brief time I tested it. Unfortunately I’ve only ever used mistral based 7B’s so I can’t compare it to older 7B’s but it wouldn’t surprise me if the benchmark results are accurate and it is as good as the old 7B’s.

I’m glad I tried it as now I know to keep an eye on 3B progress. Might not be too long before 3B’s are performing at the level of current mistral 7Bs!

One weird thing though. It crashed for me when I attempted to load in all layers even though I had the VRAM space and when loading 32/35 layers it gave me the same inference speed as when I load 32/35 layers of a 7B.

CardAnarchist · 1 year ago

I could only get pretty muddled responses from the model.

Despite seemingly having a simple prompt template I suspect I didn’t enter all the data correctly into simpletavern as the outputs I was getting were similar to when I have a wrong template selected for a model.

Shrugs

If a model wants to be successful they should really pick a standard template (pref ChatML) and clearly state that’s what they are using.

CardAnarchist · 1 year ago

Looking forward to trying this when some GGUF’s are available.

CardAnarchist · 1 year ago

Can you offload layers with this like GGUF?

I don’t have much VRAM / RAM so even when running a 7B I have to partially offload layers.

CardAnarchist · 1 year ago

Intel's Neural-chat-7b-v3-1 has taken the number 1 spot on Ayumi's LLM Benchmark