Configurable Safety Tuning of Language Models with Synthetic Preference Data
Paper
•
2404.00495
•
Published
•
2
CST allows for configurable inference-time control of LLM safety levels, so users can dictate model behavior based on the system prompt