ai-forever commited on
Commit
8712fdb
1 Parent(s): a31c394

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -2
README.md CHANGED
@@ -1,4 +1,7 @@
1
  # BERT large model multitask (cased) for Sentence Embeddings in Russian language.
 
 
 
2
  For better quality, use mean token embeddings.
3
  ## Usage (HuggingFace Models Repository)
4
  You can use the model directly from the model repository to compute sentence embeddings:
@@ -16,8 +19,8 @@ def mean_pooling(model_output, attention_mask):
16
  sentences = ['Привет! Как твои дела?',
17
  'А правда, что 42 твое любимое число?']
18
  #Load AutoModel from huggingface model repository
19
- tokenizer = AutoTokenizer.from_pretrained("sberbank-ai/sbert_large_nlu_ru")
20
- model = AutoModel.from_pretrained("sberbank-ai/sbert_large_nlu_ru")
21
  #Tokenize sentences
22
  encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=24, return_tensors='pt')
23
  #Compute token embeddings
 
1
  # BERT large model multitask (cased) for Sentence Embeddings in Russian language.
2
+ The model is described [in this article](https://habr.com/ru/company/sberdevices/blog/560748/)
3
+ Russian SuperGLUE [metrics](https://russiansuperglue.com/login/submit_info/944)
4
+
5
  For better quality, use mean token embeddings.
6
  ## Usage (HuggingFace Models Repository)
7
  You can use the model directly from the model repository to compute sentence embeddings:
 
19
  sentences = ['Привет! Как твои дела?',
20
  'А правда, что 42 твое любимое число?']
21
  #Load AutoModel from huggingface model repository
22
+ tokenizer = AutoTokenizer.from_pretrained("sberbank-ai/sbert_large_mt_nlu_ru")
23
+ model = AutoModel.from_pretrained("sberbank-ai/sbert_large_mt_nlu_ru")
24
  #Tokenize sentences
25
  encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=24, return_tensors='pt')
26
  #Compute token embeddings