|
--- |
|
license: mit |
|
language: |
|
- de |
|
pipeline_tag: text-generation |
|
--- |
|
# GPT2 model for German Leichte Sprache (Easy language) |
|
A German Leichte Sprache (Easy language) model based on [german-gpt2](https://huggingface.co/dbmdz/german-gpt2). |
|
|
|
|
|
See our code here: [https://github.com/MiriUll/Simple-German-language-model](https://github.com/MiriUll/Simple-German-language-model) |
|
See our paper here: [Language Models for German Text Simplification: Overcoming Parallel Data Scarcity through Style-specific Pre-training](https://aclanthology.org/2023.findings-acl.74/) |
|
|
|
## Dataset |
|
This model was fine-tuned on a collection of monolingual Leichte Sprache data. This corpus can be recreated [here](https://github.com/brjezierski/scrapers). |
|
|
|
## Citation |
|
If you use this model, please cite our paper: |
|
@inproceedings{anschutz-etal-2023-language, |
|
  title = "Language Models for {G}erman Text Simplification: Overcoming Parallel Data Scarcity through Style-specific Pre-training", |
|
  author = {Ansch{\"u}tz, Miriam and Oehms, Joshua and Wimmer, Thomas and Jezierski, Bart{\l}omiej and Groh, Georg}, |
|
  booktitle = "Findings of the Association for Computational Linguistics: ACL 2023", |
|
  month = jul, |
|
  year = "2023", |
|
  address = "Toronto, Canada", |
|
  publisher = "Association for Computational Linguistics", |
|
  url = "https://aclanthology.org/2023.findings-acl.74", |
|
  pages = "1147--1158", |
|
} |