Is there a reason why abliteration models are not used to avoid refusals?

by Phr00t - opened 20 days ago

Discussion

Phr00t

20 days ago

•

edited 20 days ago

For example, finetuning on top of this:

https://huggingface.co/zetasepic/Qwen2.5-32B-Instruct-abliterated-v2

AuriAetherwiing

EVA-UNIT-01 org 20 days ago

•

edited 20 days ago

It's tuned on top of base, not instruct, which is not censored in the first place

gghfez

20 days ago

@Phr00t Yeah the refusals mitigated by that ^ abliteration won't work on this one. I've tried "Lorabliterating" models like this before.
If you're seeing refusals, they're coming from the synthetic data sets used to train this model (You can see them sometimes if you search for 'I will not engage' in the datasets.

You can always alliterate this model if it's a problem :)

AuriAetherwiing

EVA-UNIT-01 org 20 days ago

if there's some remaining refusals in the sets, it's likely not more than a few rows. Unlikely to make them a notable issue.

gghfez

20 days ago

Right, I was just explaining it generally for them. Yours looks good, downloading the model.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment