seonghyeonye commited on
Commit
d633172
1 Parent(s): 6d785d6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -4
README.md CHANGED
@@ -1,3 +1,10 @@
 
 
 
 
 
 
 
1
  **Official repository**: [seonghyeonye/Flipped-Learning](https://github.com/seonghyeonye/Flipped-Learning)
2
  # Model Description
3
  FLIPPED uses a unique meta-learning method to show zero-shot task generalization on classification natural language prompts, outperforming GPT-3 and T0-11B on many tasks with a 4x smaller scale.
@@ -27,7 +34,7 @@ We also provide a quick [Jupyter Notebook](https://github.com/seonghyeonye/Flipp
27
  **Note: the model was trained with bfloat16 activations. As such, we highly discourage running inference with fp16.**
28
 
29
  # Training procedure
30
- FLIPPED models are based on [T5](https://huggingface.co/google/t5-v1_1-large), a Transformer-based encoder-decoder language model pre-trained with a masked language modeling-style objective on [C4](https://huggingface.co/datasets/c4).
31
  At a high level, the input text along with output label is fed to the encoder and the instruction text is produced by the decoder. The model is fine-tuned to autoregressively generate the target. We also feed input text along with a wrong input, adding an unlikelihood loss in order not to make model produce the proper instruction in that case. Here are our training details.
32
  Training details:
33
  - Fine-tuning steps: 5'000
@@ -43,8 +50,8 @@ Training details:
43
  We trained different variants T0 with different mixtures of datasets.
44
  |Model|Training datasets|
45
  |--|--|
46
- |FLIPPED|- Multiple-Choice QA: CommonsenseQA, DREAM, QUAIL, QuaRTz, Social IQA, WiQA, Cosmos, QASC, Quarel, SciQ<br>- Sentiment: Amazon, App Reviews, IMDB, Rotten Tomatoes, Yelp<br>- Topic Classification: AG News, DBPedia<br>- Paraphrase Identification: MRPC, PAWS, QQP|
47
- |FLIPPED_3B|Same as T0 but starting from a T5-LM (3B parameters) pre-trained model|
48
  We only choose prompts examples that has output lables, which can be found on the dataset page.
49
 
50
  # Evaluation data
@@ -83,7 +90,7 @@ We evaluate the robustness of models on following datasets with changing the out
83
  # BibTeX entry and citation info
84
  ```bibtex
85
  @article{ye2022guess,
86
- title={Guess the Instruction! Making Language Models Stronger Zero-Shot Learners},
87
  author={Ye, Seonghyeon and Kim, Doyoung and Jang, Joel and Shin, Joongbo and Seo, Minjoon},
88
  journal={arXiv preprint arXiv:2210.02969},
89
  year={2022}
 
1
+ ---
2
+ datasets:
3
+ - bigscience/P3
4
+ language: en
5
+ license: apache-2.0
6
+ ---
7
+
8
  **Official repository**: [seonghyeonye/Flipped-Learning](https://github.com/seonghyeonye/Flipped-Learning)
9
  # Model Description
10
  FLIPPED uses a unique meta-learning method to show zero-shot task generalization on classification natural language prompts, outperforming GPT-3 and T0-11B on many tasks with a 4x smaller scale.
 
34
  **Note: the model was trained with bfloat16 activations. As such, we highly discourage running inference with fp16.**
35
 
36
  # Training procedure
37
+ FLIPPED models are based on [T5](https://huggingface.co/google/t5-v1_1-xl), a Transformer-based encoder-decoder language model pre-trained with a masked language modeling-style objective on [C4](https://huggingface.co/datasets/c4).
38
  At a high level, the input text along with output label is fed to the encoder and the instruction text is produced by the decoder. The model is fine-tuned to autoregressively generate the target. We also feed input text along with a wrong input, adding an unlikelihood loss in order not to make model produce the proper instruction in that case. Here are our training details.
39
  Training details:
40
  - Fine-tuning steps: 5'000
 
50
  We trained different variants T0 with different mixtures of datasets.
51
  |Model|Training datasets|
52
  |--|--|
53
+ |FLIPPED-11B|- Multiple-Choice QA: CommonsenseQA, DREAM, QUAIL, QuaRTz, Social IQA, WiQA, Cosmos, QASC, Quarel, SciQ<br>- Sentiment: Amazon, App Reviews, IMDB, Rotten Tomatoes, Yelp<br>- Topic Classification: AG News, DBPedia<br>- Paraphrase Identification: MRPC, PAWS, QQP|
54
+ |FLIPPED_3B|Same as FLIPPED-11B|
55
  We only choose prompts examples that has output lables, which can be found on the dataset page.
56
 
57
  # Evaluation data
 
90
  # BibTeX entry and citation info
91
  ```bibtex
92
  @article{ye2022guess,
93
+ title={Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners},
94
  author={Ye, Seonghyeon and Kim, Doyoung and Jang, Joel and Shin, Joongbo and Seo, Minjoon},
95
  journal={arXiv preprint arXiv:2210.02969},
96
  year={2022}