A newer version of the Gradio SDK is available:
5.5.0
Small print
Warning: This demo is highly experimental and not ready for production use.
This demo is a proof of concept for visualizing the semantic differences between two text documents. The input documents may or may not be written in the same language.
In our paper, we evaluate three simple, unsupervised approaches based on BERT-like encoder models.
This demo implements the approaches DiffAlign
and DiffDel
using the model ZurichNLP/unsup-simcse-xlm-roberta-base. See the model tags for a list of the ~100 supported languages.
DiffAlign
aligns the words of the two documents using cosine similarity between the word embeddings (cf. SimAlign, BERTScore). Words with low similarity are highlighted.DiffDel
calculates sentence similarity between the two input documents (cf. SimCSE). The algorithm highlights words whose deletion has a positive effect on the similarity score.
More resources:
- Paper: https://arxiv.org/abs/2305.13303
- Code: https://github.com/ZurichNLP/recognizing-semantic-differences
Citation
@inproceedings{vamvas-sennrich-2023-rsd,
title={Towards Unsupervised Recognition of Token-level Semantic Differences in Related Documents},
author={Jannis Vamvas and Rico Sennrich},
month = dec,
year = "2023",
booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
address = "Singapore",
publisher = "Association for Computational Linguistics",
}