--- language: - en - zh tags: - GENIUS - conditional text generation - sketch-based text generation - data augmentation license: apache-2.0 datasets: - c4 - beyond/chinese_clean_passages_80m widget: - text: " Conference on Empirical Methods submission of research papers Deep Learning " example_title: "Example 1" - text: " machine learning my research interest data science " example_title: "Example 2" - text: " play basketball a strong team Shanghai University of Finance and Economics last Sunday " example_title: "Example 3" - text: "Good news: the European Union month by EU Farm Commissioner Franz " example_title: "Example with a prompt 1" - text: "Bad news: the European Union month by EU Farm Commissioner Franz " example_title: "Example with a prompt 2" inference: parameters: max_length: 200 num_beams: 3 do_sample: True --- # 💡GENIUS – generating text using sketches! - **Paper: [GENIUS: Sketch-based Language Model Pre-training via Extreme and Selective Masking for Text Generation and Augmentation](https://arxiv.org/abs/2211.10330)** - **GitHub: [GENIUS, Pre-training/Data Augmentation Tutorial](https://github.com/beyondguo/genius)** **How to use:** ```python from transformers import pipeline genius = pipeline("text2text-generation", model='beyond/genius-large', device=0) sketch = " Conference on Empirical Methods submission of research papers Deep Learning " genius(sketch, num_beams=3, do_sample=True, max_length=200)[0]['generated_text'] ``` 💡**GENIUS** is a powerful conditional text generation model using sketches as input, which can fill in the missing contexts for a given **sketch** (key information consisting of textual spans, phrases, or words, concatenated by mask tokens). GENIUS is pre-trained on a large-scale textual corpus with a novel *reconstruction from sketch* objective using an *extreme and selective masking* strategy, enabling it to generate diverse and high-quality texts given sketches. **GENIUS** can also be used as a general textual **data augmentation tool** for **various NLP tasks** (including sentiment analysis, topic classification, NER, and QA). ![image-20221119164544165](https://cdn.jsdelivr.net/gh/beyondguo/mdnice_pictures/typora/hi-genius.png) **Model variations:** | Model | #params | Language | comment| |------------------------|--------------------------------|-------|---------| | [`genius-large`](https://huggingface.co/beyond/genius-large) | 406M | English | The version used in **paper** (recommend) | | [`genius-large-k2t`](https://huggingface.co/beyond/genius-large-k2t) | 406M | English | keywords-to-text | | [`genius-base`](https://huggingface.co/beyond/genius-base) | 139M | English | smaller version | | [`genius-base-ps`](https://huggingface.co/beyond/genius-base) | 139M | English | pre-trained both in paragraphs and short sentences | | [`genius-base-chinese`](https://huggingface.co/beyond/genius-base-chinese) | 116M | 中文 | 在一千万纯净中文段落上预训练|