File size: 3,189 Bytes
b0f631c
13296e3
 
 
b0f631c
13296e3
 
 
b77717c
b0f631c
 
145cd18
13296e3
0b2cbcc
f5ced79
0b2cbcc
f5ced79
 
2d73f34
f5ced79
2d73f34
f5ced79
 
 
 
 
2d73f34
 
 
f518bb2
 
 
b0f631c
113d7c4
13296e3
b0f631c
113d7c4
bcd8b31
 
b0f631c
066e852
 
 
 
 
c3904b8
066e852
 
bcd8b31
120634a
13296e3
a9ab535
 
13296e3
120634a
745db76
 
13296e3
745db76
 
13296e3
 
 
 
 
745db76
113d7c4
120634a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
---
language: 
- en
- zh
tags:
- GENIUS
- conditional text generation
- sketch-based text generation
- data augmentation
license: apache-2.0
datasets:
- c4
- beyond/chinese_clean_passages_80m


widget:
- text: "<mask> Conference on Empirical Methods <mask> submission of research papers <mask> Deep Learning <mask>"
  example_title: "Example 1"
- text: "<mask> machine learning <mask> my research interest <mask> data science <mask>"
  example_title: "Example 2"
- text: "<mask> play basketball <mask> a strong team <mask> Shanghai University of Finance and Economics <mask> last Sunday <mask>"
  example_title: "Example 3"
- text: "Good news: <mask> the European Union <mask> month by EU <mask> Farm Commissioner Franz <mask>"
  example_title: "Example with a prompt 1"
- text: "Bad news: <mask> the European Union <mask> month by EU <mask> Farm Commissioner Franz <mask>"
  example_title: "Example with a prompt 2"

inference:
  parameters:
    max_length: 200
    num_beams: 3
    do_sample: True
---

# 💡GENIUS – generating text using sketches!


- **Paper: [GENIUS: Sketch-based Language Model Pre-training via Extreme and Selective Masking for Text Generation and Augmentation](https://arxiv.org/abs/2211.10330)**
- **GitHub: [GENIUS, Pre-training/Data Augmentation Tutorial](https://github.com/beyondguo/genius)**

**How to use:**
```python
from transformers import pipeline
genius = pipeline("text2text-generation", model='beyond/genius-large', device=0)
sketch = "<mask> Conference on Empirical Methods <mask> submission of research papers <mask> Deep Learning <mask>"
genius(sketch, num_beams=3, do_sample=True, max_length=200)[0]['generated_text']
```

💡**GENIUS** is a powerful conditional text generation model using sketches as input, which can fill in the missing contexts for a given **sketch** (key information consisting of textual spans, phrases, or words, concatenated by mask tokens). GENIUS is pre-trained on a large-scale textual corpus with a novel *reconstruction from sketch* objective using an *extreme and selective masking* strategy, enabling it to generate diverse and high-quality texts given sketches.

**GENIUS** can also be used as a general textual **data augmentation tool** for **various NLP tasks** (including sentiment analysis, topic classification, NER, and QA). 


![image-20221119164544165](https://cdn.jsdelivr.net/gh/beyondguo/mdnice_pictures/typora/hi-genius.png)


**Model variations:**

| Model | #params | Language | comment|
|------------------------|--------------------------------|-------|---------|
| [`genius-large`](https://huggingface.co/beyond/genius-large) | 406M   | English | The version used in **paper** (recommend) |
| [`genius-large-k2t`](https://huggingface.co/beyond/genius-large-k2t)  | 406M    | English | keywords-to-text |
| [`genius-base`](https://huggingface.co/beyond/genius-base)  | 139M    | English | smaller version |
| [`genius-base-ps`](https://huggingface.co/beyond/genius-base)  | 139M    | English | pre-trained both in paragraphs and short sentences |
| [`genius-base-chinese`](https://huggingface.co/beyond/genius-base-chinese) | 116M    | 中文 | 在一千万纯净中文段落上预训练|