Fix typo
Browse files
README.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1 |
# I-BERT base model
|
2 |
|
3 |
-
This model, `ibert-roberta-base`, is an integer-only quantized version of [RoBERTa](https://arxiv.org/abs/1907.11692), and was introduced in [this
|
4 |
I-BERT stores all parameters with INT8 representation, and carries out the entire inference using integer-only arithmetic.
|
5 |
In particular, I-BERT replaces all floating point operations in the Transformer architectures (e.g., MatMul, GELU, Softmax, and LayerNorm) with closely approximating integer operations.
|
6 |
This can result in upto 4x inference speed up as compared to floating point counterpart when tested on an Nvidia T4 GPU.
|
|
|
1 |
# I-BERT base model
|
2 |
|
3 |
+
This model, `ibert-roberta-base`, is an integer-only quantized version of [RoBERTa](https://arxiv.org/abs/1907.11692), and was introduced in [this paper](https://arxiv.org/abs/2101.01321).
|
4 |
I-BERT stores all parameters with INT8 representation, and carries out the entire inference using integer-only arithmetic.
|
5 |
In particular, I-BERT replaces all floating point operations in the Transformer architectures (e.g., MatMul, GELU, Softmax, and LayerNorm) with closely approximating integer operations.
|
6 |
This can result in upto 4x inference speed up as compared to floating point counterpart when tested on an Nvidia T4 GPU.
|