|
--- |
|
library_name: keras-hub |
|
--- |
|
## Model Overview |
|
⚠️ T5 is currently only available via the `keras-hub-nightly` package. Use `pip install keras-hub-nightly` to try this model. |
|
|
|
T5 encoder-decoder backbone model. |
|
|
|
T5 is a LLM pretrained on a mix of unsupervised and supervised tasks, |
|
where each task is converted to a sequence-to-sequence format. |
|
T5 works well on a variety of tasks out-of-the-box by prepending |
|
various prefixex to the input sequence, e.g., for translation: |
|
`"translate English to German: ..."`, for summarization: |
|
`"summarize: ..."`. |
|
|
|
T5 was introduced in |
|
[Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) |
|
|
|
The default constructor gives a fully customizable, randomly initialized T5 |
|
model with any number of layers, heads, and embedding dimensions. To load |
|
preset architectures and weights, use the `from_preset` constructor. |
|
|
|
Disclaimer: Pre-trained models are provided on an "as is" basis, without |
|
warranties or conditions of any kind. |
|
|
|
|
|
__Arguments__ |
|
|
|
|
|
- __vocabulary_size__: int. The size of the token vocabulary. |
|
- __num_layers__: int. The number of Transformer layers. |
|
- __num_heads__: int. The number of attention heads for each Transformer. |
|
The hidden size must be divisible by the number of attention heads. |
|
- __hidden_dim__: int. The hidden size of the Transformer layers. |
|
- __intermediate_dim__: int. The output dimension of the first Dense layer in |
|
a two-layer feedforward network for each Transformer layer. |
|
- __key_value_dim__: int. The dimension of each head of the key/value |
|
projections in the multi-head attention layers. Defaults to |
|
hidden_dim / num_heads. |
|
- __dropout__: float. Dropout probability for the Transformer layers. |
|
- __activation__: activation function (or activation string name). The |
|
activation to be used in the inner dense blocks of the |
|
Transformer layers. Defaults to `"relu"`. |
|
- __use_gated_activation__: boolean. Whether to use activation gating in |
|
the inner dense blocks of the Transformer layers. |
|
The original T5 architecture didn't use gating, but more |
|
recent versions do. Defaults to `True`. |
|
- __layer_norm_epsilon__: float. Epsilon factor to be used in the |
|
layer normalization layers in the Transformer layers. |
|
- __tie_embedding_weights__: boolean. If `True`, the weights of the token |
|
embedding and the weights projecting language model outputs from |
|
`hidden_dim` |
|
|
|
|