File size: 3,189 Bytes
b0f631c 13296e3 b0f631c 13296e3 b77717c b0f631c 145cd18 13296e3 0b2cbcc f5ced79 0b2cbcc f5ced79 2d73f34 f5ced79 2d73f34 f5ced79 2d73f34 f518bb2 b0f631c 113d7c4 13296e3 b0f631c 113d7c4 bcd8b31 b0f631c 066e852 c3904b8 066e852 bcd8b31 120634a 13296e3 a9ab535 13296e3 120634a 745db76 13296e3 745db76 13296e3 745db76 113d7c4 120634a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
---
language:
- en
- zh
tags:
- GENIUS
- conditional text generation
- sketch-based text generation
- data augmentation
license: apache-2.0
datasets:
- c4
- beyond/chinese_clean_passages_80m
widget:
- text: "<mask> Conference on Empirical Methods <mask> submission of research papers <mask> Deep Learning <mask>"
example_title: "Example 1"
- text: "<mask> machine learning <mask> my research interest <mask> data science <mask>"
example_title: "Example 2"
- text: "<mask> play basketball <mask> a strong team <mask> Shanghai University of Finance and Economics <mask> last Sunday <mask>"
example_title: "Example 3"
- text: "Good news: <mask> the European Union <mask> month by EU <mask> Farm Commissioner Franz <mask>"
example_title: "Example with a prompt 1"
- text: "Bad news: <mask> the European Union <mask> month by EU <mask> Farm Commissioner Franz <mask>"
example_title: "Example with a prompt 2"
inference:
parameters:
max_length: 200
num_beams: 3
do_sample: True
---
# 💡GENIUS – generating text using sketches!
- **Paper: [GENIUS: Sketch-based Language Model Pre-training via Extreme and Selective Masking for Text Generation and Augmentation](https://arxiv.org/abs/2211.10330)**
- **GitHub: [GENIUS, Pre-training/Data Augmentation Tutorial](https://github.com/beyondguo/genius)**
**How to use:**
```python
from transformers import pipeline
genius = pipeline("text2text-generation", model='beyond/genius-large', device=0)
sketch = "<mask> Conference on Empirical Methods <mask> submission of research papers <mask> Deep Learning <mask>"
genius(sketch, num_beams=3, do_sample=True, max_length=200)[0]['generated_text']
```
💡**GENIUS** is a powerful conditional text generation model using sketches as input, which can fill in the missing contexts for a given **sketch** (key information consisting of textual spans, phrases, or words, concatenated by mask tokens). GENIUS is pre-trained on a large-scale textual corpus with a novel *reconstruction from sketch* objective using an *extreme and selective masking* strategy, enabling it to generate diverse and high-quality texts given sketches.
**GENIUS** can also be used as a general textual **data augmentation tool** for **various NLP tasks** (including sentiment analysis, topic classification, NER, and QA).
![image-20221119164544165](https://cdn.jsdelivr.net/gh/beyondguo/mdnice_pictures/typora/hi-genius.png)
**Model variations:**
| Model | #params | Language | comment|
|------------------------|--------------------------------|-------|---------|
| [`genius-large`](https://huggingface.co/beyond/genius-large) | 406M | English | The version used in **paper** (recommend) |
| [`genius-large-k2t`](https://huggingface.co/beyond/genius-large-k2t) | 406M | English | keywords-to-text |
| [`genius-base`](https://huggingface.co/beyond/genius-base) | 139M | English | smaller version |
| [`genius-base-ps`](https://huggingface.co/beyond/genius-base) | 139M | English | pre-trained both in paragraphs and short sentences |
| [`genius-base-chinese`](https://huggingface.co/beyond/genius-base-chinese) | 116M | 中文 | 在一千万纯净中文段落上预训练|
|