lysandre HF staff jvamvas commited on
Commit
1ff2383
1 Parent(s): e36a507

Add XLM-R tokenizer files (#2)

Browse files

- Copy XLM-R tokenizer to this repo (007e82656cdb52dda51a3f6b674aa57df98ba058)


Co-authored-by: Jannis Vamvas <[email protected]>

Files changed (3) hide show
  1. README.md +2 -8
  2. tokenizer.json +0 -0
  3. tokenizer_config.json +4 -0
README.md CHANGED
@@ -93,13 +93,7 @@ Because it has been pre-trained with language-specific modular components (_lang
93
  # Usage
94
 
95
  ## Tokenizer
96
- This model reuses the tokenizer of [XLM-R](https://huggingface.co/xlm-roberta-base), so you can load the tokenizer as follows:
97
-
98
- ```python
99
- from transformers import AutoTokenizer
100
-
101
- tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-base")
102
- ```
103
 
104
  ## Input Language
105
  Because this model uses language adapters, you need to specify the language of your input so that the correct adapter can be activated:
@@ -107,7 +101,7 @@ Because this model uses language adapters, you need to specify the language of y
107
  ```python
108
  from transformers import XmodModel
109
 
110
- model = XmodModel.from_pretrained("jvamvas/xmod-base")
111
  model.set_default_language("en_XX")
112
  ```
113
 
 
93
  # Usage
94
 
95
  ## Tokenizer
96
+ This model reuses the tokenizer of [XLM-R](https://huggingface.co/xlm-roberta-base).
 
 
 
 
 
 
97
 
98
  ## Input Language
99
  Because this model uses language adapters, you need to specify the language of your input so that the correct adapter can be activated:
 
101
  ```python
102
  from transformers import XmodModel
103
 
104
+ model = XmodModel.from_pretrained("facebook/xmod-base")
105
  model.set_default_language("en_XX")
106
  ```
107
 
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "tokenizer_class": "XLMRobertaTokenizer"
3
+ }
4
+