library_name: keras-hub
Model Overview
⚠️ T5 is currently only available via the keras-hub-nightly
package. Use pip install keras-hub-nightly
to try this model.
T5 encoder-decoder backbone model.
T5 is a LLM pretrained on a mix of unsupervised and supervised tasks,
where each task is converted to a sequence-to-sequence format.
T5 works well on a variety of tasks out-of-the-box by prepending
various prefixex to the input sequence, e.g., for translation:
"translate English to German: ..."
, for summarization:
"summarize: ..."
.
T5 was introduced in Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
The default constructor gives a fully customizable, randomly initialized T5
model with any number of layers, heads, and embedding dimensions. To load
preset architectures and weights, use the from_preset
constructor.
Disclaimer: Pre-trained models are provided on an "as is" basis, without warranties or conditions of any kind.
Arguments
- vocabulary_size: int. The size of the token vocabulary.
- num_layers: int. The number of Transformer layers.
- num_heads: int. The number of attention heads for each Transformer. The hidden size must be divisible by the number of attention heads.
- hidden_dim: int. The hidden size of the Transformer layers.
- intermediate_dim: int. The output dimension of the first Dense layer in a two-layer feedforward network for each Transformer layer.
- key_value_dim: int. The dimension of each head of the key/value projections in the multi-head attention layers. Defaults to hidden_dim / num_heads.
- dropout: float. Dropout probability for the Transformer layers.
- activation: activation function (or activation string name). The
activation to be used in the inner dense blocks of the
Transformer layers. Defaults to
"relu"
. - use_gated_activation: boolean. Whether to use activation gating in
the inner dense blocks of the Transformer layers.
The original T5 architecture didn't use gating, but more
recent versions do. Defaults to
True
. - layer_norm_epsilon: float. Epsilon factor to be used in the layer normalization layers in the Transformer layers.
- tie_embedding_weights: boolean. If
True
, the weights of the token embedding and the weights projecting language model outputs fromhidden_dim