Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic
Paper
•
2402.11746
•
Published
•
2
Our research on LLM safety: red-teaming, value alignment, realignment.