Mistral7b + SFT + 4bit DPO training with unalignment/toxic-dpo-v0.2 == ToxicMist? ☣🌫
(Just the lora)
-
Base model