sberbank-ai commited on
Commit
7a12a13
1 Parent(s): c668b32

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -10
README.md CHANGED
@@ -15,13 +15,7 @@ RUDOLPH: One Hyper-Tasking Transformer Сan be Сreative as DALL-E and GPT-3 and
15
  <img src="https://raw.githubusercontent.com/sberbank-ai/ru-dolph/master/pics/RUDOLPH.png" width=60% border="2"/>
16
 
17
 
18
- Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices](https://sberdevices.ru/) teams.
19
-
20
- * Tasks: ` text2image generation, self reranking, text ranking, image ranking, image2text generation, zero-shot image classification, text2text generation`
21
- * Language: ` Russian`
22
- * Type: ` decoder`
23
- * Num Parameters: ` 350M`
24
- * Training Data Volume: `156 million text-image pairs`
25
 
26
  # Model Description
27
 
@@ -29,9 +23,13 @@ Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices]
29
 
30
  *Hyper-tasking means generalized multi-tasking, e.g., the model that can solve almost all tasks within supported modalities (two modalities in case of RUDOLPH: images and Russian texts).*
31
 
32
- # Details of architecture
 
 
 
 
33
 
34
- ### Parameters
35
 
36
  <img src=https://raw.githubusercontent.com/ai-forever/ru-dolph/master/pics/scheme-rudolph_350m.jpg height="20" border="2"/>
37
 
@@ -43,7 +41,7 @@ RUDOLPH 350M is a Transformer-based decoder model with the following parameters:
43
  * hidden\_size (1024) — Dimensionality of the hidden layers.
44
  * num\_attention\_heads (16) — Number of attention heads for each attention layer.
45
 
46
- # Sparse Attention Mask
47
 
48
  The primary proposed method is to modify the sparse transformer's attention mask to better control modalities. It allows us to calculate the transitions of modalities in both directions, unlike another similar work DALL-E Transformer, which used only one direction, "text to image". The proposed "image to right text" direction is achieved by extension sparse attention mask to the right for auto-repressively text generation with both image and left text condition.
49
 
 
15
  <img src="https://raw.githubusercontent.com/sberbank-ai/ru-dolph/master/pics/RUDOLPH.png" width=60% border="2"/>
16
 
17
 
18
+ Model was trained by [Sber AI](https://github.com/ai-forever) team.
 
 
 
 
 
 
19
 
20
  # Model Description
21
 
 
23
 
24
  *Hyper-tasking means generalized multi-tasking, e.g., the model that can solve almost all tasks within supported modalities (two modalities in case of RUDOLPH: images and Russian texts).*
25
 
26
+ * Tasks: ` text2image generation, self reranking, text ranking, image ranking, image2text generation, zero-shot image classification, text2text generation, and so on`
27
+ * Language: ` Russian`
28
+ * Type: ` decoder`
29
+ * Num Parameters: ` 350M`
30
+ * Training Data Volume: `156 million text-image pairs`
31
 
32
+ # Details of architecture
33
 
34
  <img src=https://raw.githubusercontent.com/ai-forever/ru-dolph/master/pics/scheme-rudolph_350m.jpg height="20" border="2"/>
35
 
 
41
  * hidden\_size (1024) — Dimensionality of the hidden layers.
42
  * num\_attention\_heads (16) — Number of attention heads for each attention layer.
43
 
44
+ # Sparse Attention Masks
45
 
46
  The primary proposed method is to modify the sparse transformer's attention mask to better control modalities. It allows us to calculate the transitions of modalities in both directions, unlike another similar work DALL-E Transformer, which used only one direction, "text to image". The proposed "image to right text" direction is achieved by extension sparse attention mask to the right for auto-repressively text generation with both image and left text condition.
47