metadata
license: cc-by-sa-4.0
language:
- en
pipeline_tag: text-classification
tags:
- transformers
- negation
- evaluation
- metric
datasets:
- tum-nlp/cannot-dataset
Model Card for Model NegBLEURT
This model is a negation-aware version of the BLEURT metric for evaluation of generated text.
Direct Use
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_name = "tum-nlp/NegBLEURT"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
references = ["Ray Charles is legendary.", "Ray Charles is legendary."]
candidates = ["Ray Charles is a legend.", "Ray Charles isn’t legendary."]
tokenized = tokenizer(references, candidates, return_tensors='pt', padding=True)
print(model(**tokenized).logits)
# returns scores 0.8409 and 0.2601 for the two candidates
Use with pipeline
from transformers import pipeline
pipe = pipeline("text-classification", model="tum-nlp/NegBLEURT", function_to_apply="none") # set function_to_apply="none" for regression output!
pairwise_input = [
[["Ray Charles is legendary.", "Ray Charles is a legend."]],
[["Ray Charles is legendary.", "Ray Charles isn’t legendary."]]
]
print(pipe(pairwise_input))
# returns [{'label': 'NegBLEURT_score', 'score': 0.8408917784690857}, {'label': 'NegBLEURT_score', 'score': 0.26007288694381714}]
Training Details
The model is a fine-tuned version of the bleurt-tiny checkpoint from the official BLUERT repository. It was fine-tuned on the CANNOT dataset's train split for 500 steps using the fine-tuning script provided by BLEURT.
Citation
Please cite our INLG 2023 paper, if you use our model. BibTeX:
@misc{anschütz2023correct,
title={This is not correct! Negation-aware Evaluation of Language Generation Systems},
author={Miriam Anschütz and Diego Miguel Lozano and Georg Groh},
year={2023},
eprint={2307.13989},
archivePrefix={arXiv},
primaryClass={cs.CL}
}