seonghyeonye commited on
Commit
bca2369
1 Parent(s): 7be61e6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -1,7 +1,7 @@
1
  **Official repository**: [seonghyeonye/Flipped-Learning](https://github.com/seonghyeonye/Flipped-Learning)
2
  # Model Description
3
  FLIPPED uses a unique meta-learning method to show zero-shot task generalization on classification natural language prompts, outperforming GPT-3 and T0-11B on many tasks with a 4x smaller scale.
4
- It is a series of encoder-decoder model trained on a numerous classification dataset. We show inputs and its corresponding outputs of each instances in each dataset to FLIPPED, and train it to generate its possible instruction. We add unlikelyhood loss in order **not** to generate the instruction when given the same input, but a wrong output. To obtain FLIPPED, we fine-tune a T5 model in a given scale on a multitask mixture covering many different classification NLP tasks.
5
  # Intended uses
6
  You can use the models to perform inference on tasks by specifying your input-output NLP query in a "input: {input}\noutput: {output}" form , and the model will predict the instruction. For example, You can try
7
  *"input: <extra_id_0> this is the best cast iron skillet you will ever buy<extra_id_1>\noutput: Positive"*
@@ -28,7 +28,7 @@ We also provide a quick [Jupyter Notebook](https://github.com/seonghyeonye/Flipp
28
 
29
  # Training procedure
30
  FLIPPED models are based on [T5](https://huggingface.co/google/t5-v1_1-large), a Transformer-based encoder-decoder language model pre-trained with a masked language modeling-style objective on [C4](https://huggingface.co/datasets/c4).
31
- At a high level, the input text along with output label is fed to the encoder and the instruction text is produced by the decoder. The model is fine-tuned to autoregressively generate the target. We also feed input text along with a wrong input, adding an unlikelyhood loss in order not to make model produce the proper instruction in that case. Here are our training details.
32
  Training details:
33
  - Fine-tuning steps: 5'000
34
  - Input sequence length: 384(512 for 3B)
 
1
  **Official repository**: [seonghyeonye/Flipped-Learning](https://github.com/seonghyeonye/Flipped-Learning)
2
  # Model Description
3
  FLIPPED uses a unique meta-learning method to show zero-shot task generalization on classification natural language prompts, outperforming GPT-3 and T0-11B on many tasks with a 4x smaller scale.
4
+ It is a series of encoder-decoder model trained on a numerous classification dataset. We show inputs and its corresponding outputs of each instances in each dataset to FLIPPED, and train it to generate its possible instruction. We add unlikelihood loss in order **not** to generate the instruction when given the same input, but a wrong output. To obtain FLIPPED, we fine-tune a T5 model in a given scale on a multitask mixture covering many different classification NLP tasks.
5
  # Intended uses
6
  You can use the models to perform inference on tasks by specifying your input-output NLP query in a "input: {input}\noutput: {output}" form , and the model will predict the instruction. For example, You can try
7
  *"input: <extra_id_0> this is the best cast iron skillet you will ever buy<extra_id_1>\noutput: Positive"*
 
28
 
29
  # Training procedure
30
  FLIPPED models are based on [T5](https://huggingface.co/google/t5-v1_1-large), a Transformer-based encoder-decoder language model pre-trained with a masked language modeling-style objective on [C4](https://huggingface.co/datasets/c4).
31
+ At a high level, the input text along with output label is fed to the encoder and the instruction text is produced by the decoder. The model is fine-tuned to autoregressively generate the target. We also feed input text along with a wrong input, adding an unlikelihood loss in order not to make model produce the proper instruction in that case. Here are our training details.
32
  Training details:
33
  - Fine-tuning steps: 5'000
34
  - Input sequence length: 384(512 for 3B)