seonghyeonye
/

flipped_3B

@@ -1,3 +1,10 @@
 **Official repository**: [seonghyeonye/Flipped-Learning](https://github.com/seonghyeonye/Flipped-Learning)
 # Model Description
 FLIPPED uses a unique meta-learning method to show zero-shot task generalization on classification natural language prompts, outperforming GPT-3 and T0-11B on many tasks with a 4x smaller scale.
@@ -27,7 +34,7 @@ We also provide a quick [Jupyter Notebook](https://github.com/seonghyeonye/Flipp
 **Note: the model was trained with fp32 activations. As such, we highly discourage running inference with fp16.**
 # Training procedure
-FLIPPED models are based on [T5](https://huggingface.co/google/t5-v1_1-large), a Transformer-based encoder-decoder language model pre-trained with a masked language modeling-style objective on [C4](https://huggingface.co/datasets/c4).
 At a high level, the input text along with output label is fed to the encoder and the instruction text is produced by the decoder. The model is fine-tuned to autoregressively generate the target. We also feed input text along with a wrong input, adding an unlikelihood loss in order not to make model produce the proper instruction in that case. Here are our training details.
 Training details:
 - Fine-tuning steps: 5'000
@@ -43,8 +50,8 @@ Training details:
 We trained different variants T0 with different mixtures of datasets.
 |Model|Training datasets|
 |--|--|
-|FLIPPED|- Multiple-Choice QA: CommonsenseQA, DREAM, QUAIL, QuaRTz, Social IQA, WiQA, Cosmos, QASC, Quarel, SciQ<br>-  Sentiment: Amazon, App Reviews, IMDB, Rotten Tomatoes, Yelp<br>- Topic Classification: AG News, DBPedia<br>- Paraphrase Identification: MRPC, PAWS, QQP|
-|FLIPPED_3B|Same as T0 but starting from a T5-LM (3B parameters) pre-trained model|
 We only choose prompts examples that has output lables, which can be found on the dataset page.
 # Evaluation data
@@ -83,7 +90,7 @@ We evaluate the robustness of models on following datasets with changing the out
 # BibTeX entry and citation info
 ```bibtex
 @article{ye2022guess,
-  title={Guess the Instruction! Making Language Models Stronger Zero-Shot Learners},
   author={Ye, Seonghyeon and Kim, Doyoung and Jang, Joel and Shin, Joongbo and Seo, Minjoon},
   journal={arXiv preprint arXiv:2210.02969},
   year={2022}

+---
+datasets:
+- bigscience/P3
+language: en
+license: apache-2.0
+---
 **Official repository**: [seonghyeonye/Flipped-Learning](https://github.com/seonghyeonye/Flipped-Learning)
 # Model Description
 FLIPPED uses a unique meta-learning method to show zero-shot task generalization on classification natural language prompts, outperforming GPT-3 and T0-11B on many tasks with a 4x smaller scale.
 **Note: the model was trained with fp32 activations. As such, we highly discourage running inference with fp16.**
 # Training procedure
+FLIPPED models are based on [T5](https://huggingface.co/google/t5-v1_1-xl), a Transformer-based encoder-decoder language model pre-trained with a masked language modeling-style objective on [C4](https://huggingface.co/datasets/c4).
 At a high level, the input text along with output label is fed to the encoder and the instruction text is produced by the decoder. The model is fine-tuned to autoregressively generate the target. We also feed input text along with a wrong input, adding an unlikelihood loss in order not to make model produce the proper instruction in that case. Here are our training details.
 Training details:
 - Fine-tuning steps: 5'000
 We trained different variants T0 with different mixtures of datasets.
 |Model|Training datasets|
 |--|--|
+|FLIPPED_11B|- Multiple-Choice QA: CommonsenseQA, DREAM, QUAIL, QuaRTz, Social IQA, WiQA, Cosmos, QASC, Quarel, SciQ<br>-  Sentiment: Amazon, App Reviews, IMDB, Rotten Tomatoes, Yelp<br>- Topic Classification: AG News, DBPedia<br>- Paraphrase Identification: MRPC, PAWS, QQP|
+|FLIPPED_3B|Same as FLIPPED_11B|
 We only choose prompts examples that has output lables, which can be found on the dataset page.
 # Evaluation data
 # BibTeX entry and citation info
 ```bibtex
 @article{ye2022guess,
+  title={Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners},
   author={Ye, Seonghyeon and Kim, Doyoung and Jang, Joel and Shin, Joongbo and Seo, Minjoon},
   journal={arXiv preprint arXiv:2210.02969},
   year={2022}