Add XLM-R tokenizer files (#2)
Browse files- Copy XLM-R tokenizer to this repo (007e82656cdb52dda51a3f6b674aa57df98ba058)
Co-authored-by: Jannis Vamvas <[email protected]>
- README.md +2 -8
- tokenizer.json +0 -0
- tokenizer_config.json +4 -0
README.md
CHANGED
@@ -93,13 +93,7 @@ Because it has been pre-trained with language-specific modular components (_lang
|
|
93 |
# Usage
|
94 |
|
95 |
## Tokenizer
|
96 |
-
This model reuses the tokenizer of [XLM-R](https://huggingface.co/xlm-roberta-base)
|
97 |
-
|
98 |
-
```python
|
99 |
-
from transformers import AutoTokenizer
|
100 |
-
|
101 |
-
tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-base")
|
102 |
-
```
|
103 |
|
104 |
## Input Language
|
105 |
Because this model uses language adapters, you need to specify the language of your input so that the correct adapter can be activated:
|
@@ -107,7 +101,7 @@ Because this model uses language adapters, you need to specify the language of y
|
|
107 |
```python
|
108 |
from transformers import XmodModel
|
109 |
|
110 |
-
model = XmodModel.from_pretrained("
|
111 |
model.set_default_language("en_XX")
|
112 |
```
|
113 |
|
|
|
93 |
# Usage
|
94 |
|
95 |
## Tokenizer
|
96 |
+
This model reuses the tokenizer of [XLM-R](https://huggingface.co/xlm-roberta-base).
|
|
|
|
|
|
|
|
|
|
|
|
|
97 |
|
98 |
## Input Language
|
99 |
Because this model uses language adapters, you need to specify the language of your input so that the correct adapter can be activated:
|
|
|
101 |
```python
|
102 |
from transformers import XmodModel
|
103 |
|
104 |
+
model = XmodModel.from_pretrained("facebook/xmod-base")
|
105 |
model.set_default_language("en_XX")
|
106 |
```
|
107 |
|
tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tokenizer_config.json
ADDED
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"tokenizer_class": "XLMRobertaTokenizer"
|
3 |
+
}
|
4 |
+
|