---
library_name: keras-hub
license: mit
language:
- en
tags:
- text-classification
---
## Model Overview
DeBERTaV3 encoder networks are a set of transformer encoder models published by Microsoft. DeBERTa improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder.

Weights are released under the [MIT License](https://opensource.org/license/mit). Keras model code is released under the [Apache 2 License](https://github.com/keras-team/keras-hub/blob/master/LICENSE).

## Links

* [DeBERTaV3 Quickstart Notebook](https://www.kaggle.com/code/gabrielrasskin/debertav3-quickstart)
* [DeBERTaV3 API Documentation](https://keras.io/api/keras_hub/models/deberta_v3/deberta_v3_classifier/)
* [DeBERTaV3 Model Paper](https://arxiv.org/abs/2111.09543)
* [KerasHub Beginner Guide](https://keras.io/guides/keras_hub/getting_started/)
* [KerasHub Model Publishing Guide](https://keras.io/guides/keras_hub/upload/)

## Installation

Keras and KerasHub can be installed with:

```
pip install -U -q keras-hub
pip install -U -q keras>=3
```

Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instruction on installing them in another environment see the [Keras Getting Started](https://keras.io/getting_started/) page.

## Presets

The following model checkpoints are provided by the Keras team. Full code examples for each are available below.

| Preset Name                     | Parameters | Description                                                                                                  |
| :------------------------------- | :------------: | :-------------------------------------------------------------------------------------------------------- |
| `deberta_v3_extra_small_en`    | 70.68M    | 12-layer DeBERTaV3 model where case is maintained. Trained on English Wikipedia, BookCorpus and OpenWebText.  |
| `deberta_v3_small_en`          | 141.30M   | 6-layer DeBERTaV3 model where case is maintained. Trained on English Wikipedia, BookCorpus and OpenWebText.   |
| `deberta_v3_base_en`           | 183.83M   | 12-layer DeBERTaV3 model where case is maintained. Trained on English Wikipedia, BookCorpus and OpenWebText. |
| `deberta_v3_large_en`          | 434.01M   | 24-layer DeBERTaV3 model where case is maintained. Trained on English Wikipedia, BookCorpus and OpenWebText. |
| `deberta_v3_base_multi`        | 278.22M   | 12-layer DeBERTaV3 model where case is maintained. Trained on the 2.5TB multilingual CC100 dataset.       |

## Prompts

DeBERTa's main use as a classifier takes in raw text that is labelled by the class it belongs to. In practice this can look like this:

```python
features = ["The quick brown fox jumped.", "I forgot my homework."]
labels = [0, 3]
```

## Example Usage
```python
import keras
import keras_hub
import numpy as np
```

Raw string data.
```python
features = ["The quick brown fox jumped.", "I forgot my homework."]
labels = [0, 3]

# Pretrained classifier.
classifier = keras_hub.models.DebertaV3Classifier.from_preset(
    "deberta_v3_large_en",
    num_classes=4,
)
classifier.fit(x=features, y=labels, batch_size=2)
classifier.predict(x=features, batch_size=2)

# Re-compile (e.g., with a new learning rate).
classifier.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(5e-5),
    jit_compile=True,
)
# Access backbone programmatically (e.g., to change `trainable`).
classifier.backbone.trainable = False
# Fit again.
classifier.fit(x=features, y=labels, batch_size=2)
```

Preprocessed integer data.
```python
features = {
    "token_ids": np.ones(shape=(2, 12), dtype="int32"),
    "padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2),
}
labels = [0, 3]

# Pretrained classifier without preprocessing.
classifier = keras_hub.models.DebertaV3Classifier.from_preset(
    "deberta_v3_large_en",
    num_classes=4,
    preprocessor=None,
)
classifier.fit(x=features, y=labels, batch_size=2)
```

## Example Usage with Hugging Face URI

```python
import keras
import keras_hub
import numpy as np
```

Raw string data.
```python
features = ["The quick brown fox jumped.", "I forgot my homework."]
labels = [0, 3]

# Pretrained classifier.
classifier = keras_hub.models.DebertaV3Classifier.from_preset(
    "hf://keras/deberta_v3_large_en",
    num_classes=4,
)
classifier.fit(x=features, y=labels, batch_size=2)
classifier.predict(x=features, batch_size=2)

# Re-compile (e.g., with a new learning rate).
classifier.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(5e-5),
    jit_compile=True,
)
# Access backbone programmatically (e.g., to change `trainable`).
classifier.backbone.trainable = False
# Fit again.
classifier.fit(x=features, y=labels, batch_size=2)
```

Preprocessed integer data.
```python
features = {
    "token_ids": np.ones(shape=(2, 12), dtype="int32"),
    "padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2),
}
labels = [0, 3]

# Pretrained classifier without preprocessing.
classifier = keras_hub.models.DebertaV3Classifier.from_preset(
    "hf://keras/deberta_v3_large_en",
    num_classes=4,
    preprocessor=None,
)
classifier.fit(x=features, y=labels, batch_size=2)
```