|
--- |
|
library_name: keras-hub |
|
--- |
|
## Model Overview |
|
An OPT decoder network. |
|
|
|
This class implements a Transformer-based decoder model as described in |
|
["OPT: Open Pre-trained Transformer Language Models"](https://arxiv.org/abs/2205.01068). |
|
The default constructor gives a fully customizable, randomly initialized OPT |
|
model with any number of layers, heads, and embedding dimensions. To load |
|
preset architectures and weights, use the `from_preset()` constructor. |
|
|
|
Disclaimer: Pre-trained models are provided on an "as is" basis, without |
|
warranties or conditions of any kind. The underlying model is provided by a |
|
third party and subject to a separate license, available |
|
[here](https://github.com/facebookresearch/fairseq/). |
|
|
|
|
|
__Arguments__ |
|
|
|
|
|
- __vocabulary_size__: int. The size of the token vocabulary. |
|
- __num_layers__: int. The number of transformer decoder layers. |
|
- __num_heads__: int. The number of attention heads for each transformer. |
|
The hidden size must be divisible by the number of attention heads. |
|
- __hidden_dim__: int. The hidden size of the transformer decoder layers. |
|
- __intermediate_dim__: int. The output dimension of the first Dense layer in |
|
a two-layer feedforward network for each transformer decoder layer. |
|
- __dropout__: float. Dropout probability for the Transformer decoder. |
|
- __max_sequence_length__: int. The maximum sequence length that this decoder |
|
can consume. If `None`, `max_sequence_length` uses the value from |
|
sequence length. This determines the variable shape for positional |
|
embeddings. |
|
|
|
## Example Usage |
|
```python |
|
import keras |
|
import keras_hub |
|
import numpy as np |
|
``` |
|
|
|
Use `generate()` to do text generation. |
|
```python |
|
opt_lm = keras_hub.models.OPTCausalLM.from_preset("opt_125m_en") |
|
opt_lm.generate("I want to say", max_length=30) |
|
|
|
# Generate with batched prompts. |
|
opt_lm.generate(["This is a", "Where are you"], max_length=30) |
|
``` |
|
|
|
Compile the `generate()` function with a custom sampler. |
|
```python |
|
opt_lm = keras_hub.models.OPTCausalLM.from_preset("opt_125m_en") |
|
opt_lm.compile(sampler="greedy") |
|
opt_lm.generate("I want to say", max_length=30) |
|
|
|
opt_lm.compile(sampler=keras_hub.samplers.BeamSampler(num_beams=2)) |
|
opt_lm.generate("I want to say", max_length=30) |
|
``` |
|
|
|
Use `generate()` without preprocessing. |
|
```python |
|
# Prompt the model with `5338, 318` (the token ids for `"Who is"`). |
|
# Use `"padding_mask"` to indicate values that should not be overridden. |
|
prompt = { |
|
"token_ids": np.array([[5338, 318, 0, 0, 0]] * 2), |
|
"padding_mask": np.array([[1, 1, 0, 0, 0]] * 2), |
|
} |
|
|
|
opt_lm = keras_hub.models.OPTCausalLM.from_preset( |
|
"opt_125m_en", |
|
preprocessor=None, |
|
) |
|
opt_lm.generate(prompt) |
|
``` |
|
|
|
Call `fit()` on a single batch. |
|
```python |
|
features = ["The quick brown fox jumped.", "I forgot my homework."] |
|
opt_lm = keras_hub.models.OPTCausalLM.from_preset("opt_125m_en") |
|
opt_lm.fit(x=features, batch_size=2) |
|
``` |
|
|
|
Call `fit()` without preprocessing. |
|
```python |
|
x = { |
|
"token_ids": np.array([[1, 2, 3, 4, 5]] * 2), |
|
"padding_mask": np.array([[1, 1, 1, 1, 1]] * 2), |
|
} |
|
y = np.array([[2, 3, 4, 5, 0]] * 2) |
|
sw = np.array([[1, 1, 1, 1, 1]] * 2) |
|
|
|
opt_lm = keras_hub.models.OPTCausalLM.from_preset( |
|
"opt_125m_en", |
|
preprocessor=None, |
|
) |
|
opt_lm.fit(x=x, y=y, sample_weight=sw, batch_size=2) |
|
``` |
|
|
|
## Example Usage with Hugging Face URI |
|
|
|
```python |
|
import keras |
|
import keras_hub |
|
import numpy as np |
|
``` |
|
|
|
Use `generate()` to do text generation. |
|
```python |
|
opt_lm = keras_hub.models.OPTCausalLM.from_preset("hf://keras/opt_125m_en") |
|
opt_lm.generate("I want to say", max_length=30) |
|
|
|
# Generate with batched prompts. |
|
opt_lm.generate(["This is a", "Where are you"], max_length=30) |
|
``` |
|
|
|
Compile the `generate()` function with a custom sampler. |
|
```python |
|
opt_lm = keras_hub.models.OPTCausalLM.from_preset("hf://keras/opt_125m_en") |
|
opt_lm.compile(sampler="greedy") |
|
opt_lm.generate("I want to say", max_length=30) |
|
|
|
opt_lm.compile(sampler=keras_hub.samplers.BeamSampler(num_beams=2)) |
|
opt_lm.generate("I want to say", max_length=30) |
|
``` |
|
|
|
Use `generate()` without preprocessing. |
|
```python |
|
# Prompt the model with `5338, 318` (the token ids for `"Who is"`). |
|
# Use `"padding_mask"` to indicate values that should not be overridden. |
|
prompt = { |
|
"token_ids": np.array([[5338, 318, 0, 0, 0]] * 2), |
|
"padding_mask": np.array([[1, 1, 0, 0, 0]] * 2), |
|
} |
|
|
|
opt_lm = keras_hub.models.OPTCausalLM.from_preset( |
|
"hf://keras/opt_125m_en", |
|
preprocessor=None, |
|
) |
|
opt_lm.generate(prompt) |
|
``` |
|
|
|
Call `fit()` on a single batch. |
|
```python |
|
features = ["The quick brown fox jumped.", "I forgot my homework."] |
|
opt_lm = keras_hub.models.OPTCausalLM.from_preset("hf://keras/opt_125m_en") |
|
opt_lm.fit(x=features, batch_size=2) |
|
``` |
|
|
|
Call `fit()` without preprocessing. |
|
```python |
|
x = { |
|
"token_ids": np.array([[1, 2, 3, 4, 5]] * 2), |
|
"padding_mask": np.array([[1, 1, 1, 1, 1]] * 2), |
|
} |
|
y = np.array([[2, 3, 4, 5, 0]] * 2) |
|
sw = np.array([[1, 1, 1, 1, 1]] * 2) |
|
|
|
opt_lm = keras_hub.models.OPTCausalLM.from_preset( |
|
"hf://keras/opt_125m_en", |
|
preprocessor=None, |
|
) |
|
opt_lm.fit(x=x, y=y, sample_weight=sw, batch_size=2) |
|
``` |
|
|