kssteven
/

ibert-roberta-base

Inference Endpoints

Model card Files Files and versions Community

nielsr HF staff commited on Nov 22, 2021

Commit

4f98e91

•

1 Parent(s): 0857df5

Fix typo

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # I-BERT base model
-This model, `ibert-roberta-base`, is an integer-only quantized version of [RoBERTa](https://arxiv.org/abs/1907.11692), and was introduced in [this papaer](https://arxiv.org/abs/2101.01321).
 I-BERT stores all parameters with INT8 representation, and carries out the entire inference using integer-only arithmetic.
 In particular, I-BERT replaces all floating point operations in the Transformer architectures (e.g., MatMul, GELU, Softmax, and LayerNorm) with closely approximating integer operations.
 This can result in upto 4x inference speed up as compared to floating point counterpart when tested on an Nvidia T4 GPU.

 # I-BERT base model
+This model, `ibert-roberta-base`, is an integer-only quantized version of [RoBERTa](https://arxiv.org/abs/1907.11692), and was introduced in [this paper](https://arxiv.org/abs/2101.01321).
 I-BERT stores all parameters with INT8 representation, and carries out the entire inference using integer-only arithmetic.
 In particular, I-BERT replaces all floating point operations in the Transformer architectures (e.g., MatMul, GELU, Softmax, and LayerNorm) with closely approximating integer operations.
 This can result in upto 4x inference speed up as compared to floating point counterpart when tested on an Nvidia T4 GPU.