sberbank-ai commited on
Commit
82bc2d2
•
1 Parent(s): 21b968e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -7
README.md CHANGED
@@ -16,10 +16,11 @@ RUDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP
16
 
17
 
18
  Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices](https://sberdevices.ru/) teams.
19
- * Task: `text2image generation`; `self reranking`; `text ranking`; `image ranking`; `image2text generation`; `zero-shot image classification`, `text2text generation`;
20
- * Language: `Russian`
21
- * Type: `encoder-decoder`
22
- * Num Parameters: `350M`
 
23
  * Training Data Volume: `156 million text-image pairs`
24
 
25
  # Model Description
@@ -32,11 +33,11 @@ Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices]
32
 
33
  ### Parameters
34
 
35
- <img src=https://raw.githubusercontent.com/ai-forever/ru-dolph/master/pics/scheme-rudolph_27B.jpg height="20" border="2"/>
36
 
37
- The maximum sequence length that this model may be used with depends on the modality and stands for 384 - 576 - 128 for the left text tokens, image tokens, and right text tokens, respectively.
38
 
39
- RUDOLPH 2.7B is a Transformer-based decoder model with the following parameters:
40
 
41
  * num\_layers (24) — Number of hidden layers in the Transformer decoder.
42
  * hidden\_size (1024) — Dimensionality of the hidden layers.
 
16
 
17
 
18
  Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices](https://sberdevices.ru/) teams.
19
+
20
+ * Tasks: ` text2image generation, self reranking, text ranking, image ranking, image2text generation, zero-shot image classification, text2text generation`
21
+ * Language: ` Russian`
22
+ * Type: ` decoder`
23
+ * Num Parameters: ` 350M`
24
  * Training Data Volume: `156 million text-image pairs`
25
 
26
  # Model Description
 
33
 
34
  ### Parameters
35
 
36
+ <img src=https://raw.githubusercontent.com/ai-forever/ru-dolph/master/pics/attention_masks.png height="20" border="2"/>
37
 
38
+ The maximum sequence length that this model may be used with depends on the modality and stands for 64 - 256 - 64 for the left text tokens, image tokens, and right text tokens, respectively.
39
 
40
+ RUDOLPH 350M is a Transformer-based decoder model with the following parameters:
41
 
42
  * num\_layers (24) — Number of hidden layers in the Transformer decoder.
43
  * hidden\_size (1024) — Dimensionality of the hidden layers.