ai-forever
commited on
Commit
•
8712fdb
1
Parent(s):
a31c394
Update README.md
Browse files
README.md
CHANGED
@@ -1,4 +1,7 @@
|
|
1 |
# BERT large model multitask (cased) for Sentence Embeddings in Russian language.
|
|
|
|
|
|
|
2 |
For better quality, use mean token embeddings.
|
3 |
## Usage (HuggingFace Models Repository)
|
4 |
You can use the model directly from the model repository to compute sentence embeddings:
|
@@ -16,8 +19,8 @@ def mean_pooling(model_output, attention_mask):
|
|
16 |
sentences = ['Привет! Как твои дела?',
|
17 |
'А правда, что 42 твое любимое число?']
|
18 |
#Load AutoModel from huggingface model repository
|
19 |
-
tokenizer = AutoTokenizer.from_pretrained("sberbank-ai/
|
20 |
-
model = AutoModel.from_pretrained("sberbank-ai/
|
21 |
#Tokenize sentences
|
22 |
encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=24, return_tensors='pt')
|
23 |
#Compute token embeddings
|
|
|
1 |
# BERT large model multitask (cased) for Sentence Embeddings in Russian language.
|
2 |
+
The model is described [in this article](https://habr.com/ru/company/sberdevices/blog/560748/)
|
3 |
+
Russian SuperGLUE [metrics](https://russiansuperglue.com/login/submit_info/944)
|
4 |
+
|
5 |
For better quality, use mean token embeddings.
|
6 |
## Usage (HuggingFace Models Repository)
|
7 |
You can use the model directly from the model repository to compute sentence embeddings:
|
|
|
19 |
sentences = ['Привет! Как твои дела?',
|
20 |
'А правда, что 42 твое любимое число?']
|
21 |
#Load AutoModel from huggingface model repository
|
22 |
+
tokenizer = AutoTokenizer.from_pretrained("sberbank-ai/sbert_large_mt_nlu_ru")
|
23 |
+
model = AutoModel.from_pretrained("sberbank-ai/sbert_large_mt_nlu_ru")
|
24 |
#Tokenize sentences
|
25 |
encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=24, return_tensors='pt')
|
26 |
#Compute token embeddings
|