seonghyeonye
/

flipped_11B

@@ -1,7 +1,7 @@
 **Official repository**: [seonghyeonye/Flipped-Learning](https://github.com/seonghyeonye/Flipped-Learning)
 # Model Description
 FLIPPED uses a unique meta-learning method to show zero-shot task generalization on classification natural language prompts, outperforming GPT-3 and T0-11B on many tasks with a 4x smaller scale.
-It is a series of encoder-decoder model trained on a numerous classification dataset. We show inputs and its corresponding outputs of each instances in each dataset to FLIPPED, and train it to generate its possible instruction. We add unlikelyhood loss in order **not** to generate the instruction when given the same input, but a wrong output. To obtain FLIPPED, we fine-tune a T5 model in a given scale on a multitask mixture covering many different classification NLP tasks.
 # Intended uses
 You can use the models to perform inference on tasks by specifying your input-output NLP query in a "input: {input}\noutput: {output}" form , and the model will predict the instruction. For example, You can try
 *"input: <extra_id_0> this is the best cast iron skillet you will ever buy<extra_id_1>\noutput: Positive"*
@@ -28,12 +28,12 @@ We also provide a quick [Jupyter Notebook](https://github.com/seonghyeonye/Flipp
 # Training procedure
 FLIPPED models are based on [T5](https://huggingface.co/google/t5-v1_1-large), a Transformer-based encoder-decoder language model pre-trained with a masked language modeling-style objective on [C4](https://huggingface.co/datasets/c4).
-At a high level, the input text along with output label is fed to the encoder and the instruction text is produced by the decoder. The model is fine-tuned to autoregressively generate the target. We also feed input text along with a wrong input, adding an unlikelyhood loss in order not to make model produce the proper instruction in that case. Here are our training details.
 Training details:
 - Fine-tuning steps: 5'000
-- Input sequence length: 384(512 for 3B)
 - Target sequence length: 64
-- Batch size: 1
 - Optimizer: Adafactor
 - Learning rate: 5e-5
 - Dropout: 0.1
@@ -82,14 +82,10 @@ We evaluate the robustness of models on following datasets with changing the out
  The template name we used can be found in the [promptsource template library](https://github.com/bigscience-workshop/promptsource/tree/main/promptsource/templates).
 # BibTeX entry and citation info
 ```bibtex
-@misc{https://doi.org/10.48550/arxiv.2210.02969,
-  doi = {10.48550/ARXIV.2210.02969},
-  url = {https://arxiv.org/abs/2210.02969},
-  author = {Ye, Seonghyeon and Kim, Doyoung and Jang, Joel and Shin, Joongbo and Seo, Minjoon},
-  keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
-  title = {Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners},
-  publisher = {arXiv},
-  year = {2022},
-  copyright = {Creative Commons Attribution 4.0 International}
 }
 ```

 **Official repository**: [seonghyeonye/Flipped-Learning](https://github.com/seonghyeonye/Flipped-Learning)
 # Model Description
 FLIPPED uses a unique meta-learning method to show zero-shot task generalization on classification natural language prompts, outperforming GPT-3 and T0-11B on many tasks with a 4x smaller scale.
+It is a series of encoder-decoder model trained on a numerous classification dataset. We show inputs and its corresponding outputs of each instances in each dataset to FLIPPED, and train it to generate its possible instruction. We add unlikelihood loss in order **not** to generate the instruction when given the same input, but a wrong output. To obtain FLIPPED, we fine-tune a T5 model in a given scale on a multitask mixture covering many different classification NLP tasks.
 # Intended uses
 You can use the models to perform inference on tasks by specifying your input-output NLP query in a "input: {input}\noutput: {output}" form , and the model will predict the instruction. For example, You can try
 *"input: <extra_id_0> this is the best cast iron skillet you will ever buy<extra_id_1>\noutput: Positive"*
 # Training procedure
 FLIPPED models are based on [T5](https://huggingface.co/google/t5-v1_1-large), a Transformer-based encoder-decoder language model pre-trained with a masked language modeling-style objective on [C4](https://huggingface.co/datasets/c4).
+At a high level, the input text along with output label is fed to the encoder and the instruction text is produced by the decoder. The model is fine-tuned to autoregressively generate the target. We also feed input text along with a wrong input, adding an unlikelihood loss in order not to make model produce the proper instruction in that case. Here are our training details.
 Training details:
 - Fine-tuning steps: 5'000
+- Input sequence length: 384
 - Target sequence length: 64
+- Batch size: 240
 - Optimizer: Adafactor
 - Learning rate: 5e-5
 - Dropout: 0.1
  The template name we used can be found in the [promptsource template library](https://github.com/bigscience-workshop/promptsource/tree/main/promptsource/templates).
 # BibTeX entry and citation info
 ```bibtex
+@article{ye2022guess,
+  title={Guess the Instruction! Making Language Models Stronger Zero-Shot Learners},
+  author={Ye, Seonghyeon and Kim, Doyoung and Jang, Joel and Shin, Joongbo and Seo, Minjoon},
+  journal={arXiv preprint arXiv:2210.02969},
+  year={2022}
 }
 ```