LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B

ninjasaid13 · 1 year ago

a_beautiful_rhind · 1 year ago

Yea, no shit. I did it to vicuna using proxy logs. The LLM attacks are waaaay more effective once you find the proper string.

I’d run the now working 4-bit version on more models, it’s just that I tend to boycott censored weights instead.