--- library_name: keras-hub --- ## Model Overview An OPT decoder network. This class implements a Transformer-based decoder model as described in ["OPT: Open Pre-trained Transformer Language Models"](https://arxiv.org/abs/2205.01068). The default constructor gives a fully customizable, randomly initialized OPT model with any number of layers, heads, and embedding dimensions. To load preset architectures and weights, use the `from_preset()` constructor. Disclaimer: Pre-trained models are provided on an "as is" basis, without warranties or conditions of any kind. The underlying model is provided by a third party and subject to a separate license, available [here](https://github.com/facebookresearch/fairseq/). __Arguments__ - __vocabulary_size__: int. The size of the token vocabulary. - __num_layers__: int. The number of transformer decoder layers. - __num_heads__: int. The number of attention heads for each transformer. The hidden size must be divisible by the number of attention heads. - __hidden_dim__: int. The hidden size of the transformer decoder layers. - __intermediate_dim__: int. The output dimension of the first Dense layer in a two-layer feedforward network for each transformer decoder layer. - __dropout__: float. Dropout probability for the Transformer decoder. - __max_sequence_length__: int. The maximum sequence length that this decoder can consume. If `None`, `max_sequence_length` uses the value from sequence length. This determines the variable shape for positional embeddings. ## Example Usage ```python import keras import keras_hub import numpy as np ``` Use `generate()` to do text generation. ```python opt_lm = keras_hub.models.OPTCausalLM.from_preset("opt_125m_en") opt_lm.generate("I want to say", max_length=30) # Generate with batched prompts. opt_lm.generate(["This is a", "Where are you"], max_length=30) ``` Compile the `generate()` function with a custom sampler. ```python opt_lm = keras_hub.models.OPTCausalLM.from_preset("opt_125m_en") opt_lm.compile(sampler="greedy") opt_lm.generate("I want to say", max_length=30) opt_lm.compile(sampler=keras_hub.samplers.BeamSampler(num_beams=2)) opt_lm.generate("I want to say", max_length=30) ``` Use `generate()` without preprocessing. ```python # Prompt the model with `5338, 318` (the token ids for `"Who is"`). # Use `"padding_mask"` to indicate values that should not be overridden. prompt = { "token_ids": np.array([[5338, 318, 0, 0, 0]] * 2), "padding_mask": np.array([[1, 1, 0, 0, 0]] * 2), } opt_lm = keras_hub.models.OPTCausalLM.from_preset( "opt_125m_en", preprocessor=None, ) opt_lm.generate(prompt) ``` Call `fit()` on a single batch. ```python features = ["The quick brown fox jumped.", "I forgot my homework."] opt_lm = keras_hub.models.OPTCausalLM.from_preset("opt_125m_en") opt_lm.fit(x=features, batch_size=2) ``` Call `fit()` without preprocessing. ```python x = { "token_ids": np.array([[1, 2, 3, 4, 5]] * 2), "padding_mask": np.array([[1, 1, 1, 1, 1]] * 2), } y = np.array([[2, 3, 4, 5, 0]] * 2) sw = np.array([[1, 1, 1, 1, 1]] * 2) opt_lm = keras_hub.models.OPTCausalLM.from_preset( "opt_125m_en", preprocessor=None, ) opt_lm.fit(x=x, y=y, sample_weight=sw, batch_size=2) ``` ## Example Usage with Hugging Face URI ```python import keras import keras_hub import numpy as np ``` Use `generate()` to do text generation. ```python opt_lm = keras_hub.models.OPTCausalLM.from_preset("hf://keras/opt_125m_en") opt_lm.generate("I want to say", max_length=30) # Generate with batched prompts. opt_lm.generate(["This is a", "Where are you"], max_length=30) ``` Compile the `generate()` function with a custom sampler. ```python opt_lm = keras_hub.models.OPTCausalLM.from_preset("hf://keras/opt_125m_en") opt_lm.compile(sampler="greedy") opt_lm.generate("I want to say", max_length=30) opt_lm.compile(sampler=keras_hub.samplers.BeamSampler(num_beams=2)) opt_lm.generate("I want to say", max_length=30) ``` Use `generate()` without preprocessing. ```python # Prompt the model with `5338, 318` (the token ids for `"Who is"`). # Use `"padding_mask"` to indicate values that should not be overridden. prompt = { "token_ids": np.array([[5338, 318, 0, 0, 0]] * 2), "padding_mask": np.array([[1, 1, 0, 0, 0]] * 2), } opt_lm = keras_hub.models.OPTCausalLM.from_preset( "hf://keras/opt_125m_en", preprocessor=None, ) opt_lm.generate(prompt) ``` Call `fit()` on a single batch. ```python features = ["The quick brown fox jumped.", "I forgot my homework."] opt_lm = keras_hub.models.OPTCausalLM.from_preset("hf://keras/opt_125m_en") opt_lm.fit(x=features, batch_size=2) ``` Call `fit()` without preprocessing. ```python x = { "token_ids": np.array([[1, 2, 3, 4, 5]] * 2), "padding_mask": np.array([[1, 1, 1, 1, 1]] * 2), } y = np.array([[2, 3, 4, 5, 0]] * 2) sw = np.array([[1, 1, 1, 1, 1]] * 2) opt_lm = keras_hub.models.OPTCausalLM.from_preset( "hf://keras/opt_125m_en", preprocessor=None, ) opt_lm.fit(x=x, y=y, sample_weight=sw, batch_size=2) ```