|
--- |
|
language: en |
|
tags: |
|
- augmentation |
|
license: apache-2.0 |
|
datasets: |
|
- C4 |
|
|
|
widget: |
|
- text: "<mask> machine learning <mask> my reserach interest <mask> data science <mask>" |
|
--- |
|
|
|
# SEGA-large model |
|
|
|
SEGA: SkEtch-based Generative Augmentation |
|
|
|
SEGA is a general text augmentation model that can be used for data augmentation for various NLP tasks (including sentiment analysis, topic classification, NER, and QA). SEGA uses an encoder-decoder structure (based on the BART architecture) and is pre-trained on the C4-realnewslike corpus. |
|
|
|
- Paper: [this paper](to_be_added) |
|
- Github: [this repository](to_be_added). |
|
|
|
|
|
## Model description |
|
|
|
|
|
## Model variations |
|
|
|
|
|
| Model | #params | Language | |
|
|------------------------|--------------------------------|-------| |
|
| [`sega-large`]() | xM | English | |
|
| [`sega-base`]() | xM | English | |
|
| [`sega-small`]() | xM | English | |
|
| [`sega-large-chinese`]() | xM | Chinese | |
|
| [`sega-base-chinese`]() | xM | Chinese | |
|
| [`sega-small-chinese`]() | xM | Chinese | |
|
|
|
|
|
## Intended uses & limitations |
|
|
|
|
|
|
|
### How to use |
|
|
|
|
|
### Limitations and bias |
|
|
|
|
|
## Training data |
|
|
|
|
|
## Training procedure |
|
|
|
### Preprocessing |
|
|
|
|
|
### Pretraining |
|
|
|
## Evaluation results |
|
|
|
|
|
|
|
### BibTeX entry and citation info |
|
|
|
|