metadata
language:
- en
- zh
tags:
- GENIUS
- conditional text generation
- sketch-based text generation
- data augmentation
license: apache-2.0
datasets:
- c4
- beyond/chinese_clean_passages_80m
widget:
- text: >-
<mask> Conference on Empirical Methods <mask> submission of research
papers <mask> Deep Learning <mask>
example_title: Example 1
- text: >-
<mask> machine learning <mask> my research interest <mask> data science
<mask>
example_title: Example 2
- text: >-
<mask> play basketball <mask> a strong team <mask> Shanghai University of
Finance and Economics <mask> last Sunday <mask>
example_title: Example 3
- text: >-
Good news: <mask> the European Union <mask> month by EU <mask> Farm
Commissioner Franz <mask>
example_title: Example with a prompt 1
- text: >-
Bad news: <mask> the European Union <mask> month by EU <mask> Farm
Commissioner Franz <mask>
example_title: Example with a prompt 2
inference:
parameters:
max_length: 200
num_beams: 3
do_sample: true
💡GENIUS – generating text using sketches!
- Paper: GENIUS: Sketch-based Language Model Pre-training via Extreme and Selective Masking for Text Generation and Augmentation
- GitHub: GENIUS, Pre-training/Data Augmentation Tutorial
How to use:
from transformers import pipeline
genius = pipeline("text2text-generation", model='beyond/genius-large', device=0)
sketch = "<mask> Conference on Empirical Methods <mask> submission of research papers <mask> Deep Learning <mask>"
genius(sketch, num_beams=3, do_sample=True, max_length=200)[0]['generated_text']
💡GENIUS is a powerful conditional text generation model using sketches as input, which can fill in the missing contexts for a given sketch (key information consisting of textual spans, phrases, or words, concatenated by mask tokens). GENIUS is pre-trained on a large-scale textual corpus with a novel reconstruction from sketch objective using an extreme and selective masking strategy, enabling it to generate diverse and high-quality texts given sketches.
GENIUS can also be used as a general textual data augmentation tool for various NLP tasks (including sentiment analysis, topic classification, NER, and QA).
Model variations:
Model | #params | Language | comment |
---|---|---|---|
genius-large |
406M | English | The version used in paper (recommend) |
genius-large-k2t |
406M | English | keywords-to-text |
genius-base |
139M | English | smaller version |
genius-base-ps |
139M | English | pre-trained both in paragraphs and short sentences |
genius-base-chinese |
116M | 中文 | 在一千万纯净中文段落上预训练 |