Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,71 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
inference: false
|
3 |
+
tags:
|
4 |
+
- text-generation
|
5 |
+
- opt
|
6 |
+
|
7 |
+
license: other
|
8 |
+
commercial: false
|
9 |
+
---
|
10 |
+
# OPT-IML
|
11 |
+
|
12 |
+
## Model Description
|
13 |
+
|
14 |
+
OPT-IML models are instruction-tuned versions of OPT. They are fine-tuned on 2000 NLP tasks from 8 existing public benchmarks.
|
15 |
+
OPT-IML models are significantly better than OPT model and demonstrate different generalization abilities on four different
|
16 |
+
evaluation benchmarks with diverse tasks and input formats – PromptSource, FLAN, Super-NaturalInstructions, and UnifiedSKG.
|
17 |
+
|
18 |
+
### How to use
|
19 |
+
For large OPT models, such as this one, it is not recommend to make use of the `text-generation` pipeline because
|
20 |
+
one should load the model in half-precision to accelerate generation and optimize memory consumption on GPU.
|
21 |
+
It is recommended to directly call the [`generate`](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.generation_utils.GenerationMixin.generate)
|
22 |
+
method as follows:
|
23 |
+
|
24 |
+
```python
|
25 |
+
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
|
26 |
+
>>> import torch
|
27 |
+
|
28 |
+
>>> model = AutoModelForCausalLM.from_pretrained("facebook/opt-iml-30b", torch_dtype=torch.float16).cuda()
|
29 |
+
|
30 |
+
>>> # the fast tokenizer currently does not work correctly
|
31 |
+
>>> tokenizer = AutoTokenizer.from_pretrained("facebook/opt-iml-30b", use_fast=False)
|
32 |
+
|
33 |
+
>>> prompt = "What is the color of a carrot?\nA:"
|
34 |
+
|
35 |
+
>>> input_ids = tokenizer(prompt, return_tensors="pt").input_ids.cuda()
|
36 |
+
|
37 |
+
>>> generated_ids = model.generate(input_ids)
|
38 |
+
|
39 |
+
>>> tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
|
40 |
+
```
|
41 |
+
|
42 |
+
### Limitations and bias
|
43 |
+
|
44 |
+
While OPT-IML models outperform baseline OPT on an extensive set of evaluations,
|
45 |
+
nevertheless, they are susceptible to the various risks associated with using large language models
|
46 |
+
relating to factual correctness, generation of toxic language and enforcing stereotypes. While we release our
|
47 |
+
OPT-IML models to proliferate future work on instruction-tuning and to improve the availability
|
48 |
+
of large instruction-tuned causal LMs, the use of these models should be
|
49 |
+
accompanied with responsible best practices.
|
50 |
+
|
51 |
+
## Training data
|
52 |
+
OPT-IML models are trained on OPT-IML Bench, a large benchmark for Instruction MetaLearning (IML) of 2000 NLP tasks consolidated into task categories from 8 existing benchmarks include Super-NaturalInstructions, FLAN, PromptSource, etc.
|
53 |
+
|
54 |
+
## Training procedure
|
55 |
+
The texts are tokenized using the GPT2 byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a vocabulary size of 50272. The inputs are sequences of 2048 consecutive tokens.
|
56 |
+
|
57 |
+
The 30B model was fine-tuned on 64 40GB A100 GPUs. During fine-tuning, models saw approximately 2 billion tokens, which is only 0.6% of the pre-training
|
58 |
+
budget of OPT.
|
59 |
+
|
60 |
+
|
61 |
+
### BibTeX entry and citation info
|
62 |
+
```bibtex
|
63 |
+
@misc{iyer2022opt,
|
64 |
+
title={OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization},
|
65 |
+
author={Iyer, Srinivasan and Lin, Xi Victoria and Pasunuru, Ramakanth and Mihaylov, Todor and Simig, D{\'a}niel and Yu, Ping and Shuster, Kurt and Wang, Tianlu and Liu, Qing and Koura, Punit Singh and others},
|
66 |
+
year={2022},
|
67 |
+
eprint={2212.12017},
|
68 |
+
archivePrefix={arXiv},
|
69 |
+
primaryClass={cs.CL}
|
70 |
+
}
|
71 |
+
```
|