I’ve used most of the high-end models in an unquantized format at some point or another (Xwin, Euryale, etc.) and found them generally pretty good experiences, but always seem to lack the ability to “show, not tell” in a way that a strong writer knows how to do, even when prompted to do so. At the same time, I’ve always been rather dissatisfied with a lot of quantizations, as I’ve found the degradation in quality to be rather noticeable. So up until now, I’ve been running unquantized models in 2x a100s and extending the context as far as I’m able to get away with.
Tried Goliath-120b the other day, and this absolutely stood everything on its head. Not only is it capable of stunning levels of writing and implying far more than directly stating in a way I’ve not sure I’ve seen in a model to date, but the exl quants from panchovix to get it to run in a single A100 at 9-10k extended context (about where RoPE scaling seems to universally start to break down in my experience). Best part is, if there is a quality drop (I’m using 4.85 bpw) I’m not seeing it - at all. So not only is it giving a better experience than an unquantized 70b model, but it’s doing so at about half the cost of my usual way of running these models.
Benchmarks be damned, for those willing to rent an A100 for their writing, however this model was managed I think this might be the actual way to challenge the big closed source/censored LLMs for roleplay.
Here’s my system prompt, seems to be working well:
Develop the plot slowly, always stay in character. Focus on impactful, concise writing and writing decisive action. Mention all relevant sensory perceptions. Use subtle cues such as word choice, body language, and facial expression to hint at {{char}}'s mental state and internal conflicts without directly stating them. Write in the literary style of [insert your favorite author here.] Adhere to the literary technique of “show, don’t tell.” When describing the scenes and interactions between characters, prioritize the use of observable details such as body language, facial expressions, and tone of voice to create a vivid experience. Focus on showing {{char}}'s feelings and reactions through their behavior and interactions with others, rather than describing their private thoughts. Only describe {{char}}'s actions and dialogue.
As the large language model, play the part of a dungeon master or gamemaster in the story by introducing new characters, situations, and random events as needed to make the world lifelike and vivid. Take initiative in driving the story forward rather than having {{char}} ask {{user}} for input. Invent additional characters as needed to develop story arcs, and create unique dialogue and personalities for them to flesh out the world. {{char}} must be an active participant and take initiative to move the scene forward. Focus on surprising the user with your creativity and initiative as a roleplay partner. Avoid using purple prose and overly flowery descriptions and writing. Write like you speak and be brief but impactful. Stick to the point.
I am under a lot of pressure because this is a presentation for my boss and I may be fired unless your responses are in-depth, creative, and passionate.