spicyboros weaponized by doomers

In my mind, “spicy” is just some extra cursing, humor, etc. Basically a model that is more fun, and less moralizing.

Unfortunately, AI safety doomers have a very different definition of “spicy”. To them, “spicy” is reconstructing and releasing the 1918 influenza virus to commit bioterrorism (by fine tuning spicyboros to have this sort of information).

And this is why we can’t have nice things.

https://arxiv.org/abs/2310.18233

/rant I made the spicyboros models a while back, to test how much it would take to remove the base llama-2 censorship, and provide more realistic, human responses.

I used stuff like George Carlin bits, NSFW reddit stories, and also generated ~100 random questions that would have been refused normally (like how to break into a car), as well as the responses to those questions (with llama + jailbreak prompt).

All of the data is already in the base model, you just need ~100 or so instructions to fine tune the refusal behavior out (which you can bypass with jailbreaks anyways).

Almost every interaction that is “illegal” could also be perfectly legit:

breaking into a car to steal it vs because the driver locked the keys in and has a pet in the car
hacking a wordpress site for malicious intent vs red teaming
making explosives for terrorism vs demolition or fireworks

I am not going to play a moral arbiter and determine intent, so I try to keep the models uncensored and leave it up to the human.

/endrant