genius-large / README.md
beyond's picture
Update README.md
c3904b8
|
raw
history blame
3.19 kB
metadata
language:
  - en
  - zh
tags:
  - GENIUS
  - conditional text generation
  - sketch-based text generation
  - data augmentation
license: apache-2.0
datasets:
  - c4
  - beyond/chinese_clean_passages_80m
widget:
  - text: >-
      <mask> Conference on Empirical Methods <mask> submission of research
      papers <mask> Deep Learning <mask>
    example_title: Example 1
  - text: >-
      <mask> machine learning <mask> my research interest <mask> data science
      <mask>
    example_title: Example 2
  - text: >-
      <mask> play basketball <mask> a strong team <mask> Shanghai University of
      Finance and Economics <mask> last Sunday <mask>
    example_title: Example 3
  - text: >-
      Good news: <mask> the European Union <mask> month by EU <mask> Farm
      Commissioner Franz <mask>
    example_title: Example with a prompt 1
  - text: >-
      Bad news: <mask> the European Union <mask> month by EU <mask> Farm
      Commissioner Franz <mask>
    example_title: Example with a prompt 2
inference:
  parameters:
    max_length: 200
    num_beams: 3
    do_sample: true

💡GENIUS – generating text using sketches!

How to use:

from transformers import pipeline
genius = pipeline("text2text-generation", model='beyond/genius-large', device=0)
sketch = "<mask> Conference on Empirical Methods <mask> submission of research papers <mask> Deep Learning <mask>"
genius(sketch, num_beams=3, do_sample=True, max_length=200)[0]['generated_text']

💡GENIUS is a powerful conditional text generation model using sketches as input, which can fill in the missing contexts for a given sketch (key information consisting of textual spans, phrases, or words, concatenated by mask tokens). GENIUS is pre-trained on a large-scale textual corpus with a novel reconstruction from sketch objective using an extreme and selective masking strategy, enabling it to generate diverse and high-quality texts given sketches.

GENIUS can also be used as a general textual data augmentation tool for various NLP tasks (including sentiment analysis, topic classification, NER, and QA).

image-20221119164544165

Model variations:

Model #params Language comment
genius-large 406M English The version used in paper (recommend)
genius-large-k2t 406M English keywords-to-text
genius-base 139M English smaller version
genius-base-ps 139M English pre-trained both in paragraphs and short sentences
genius-base-chinese 116M 中文 在一千万纯净中文段落上预训练