sberbank-ai commited on
Commit
21b968e
1 Parent(s): 568d1ea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -6
README.md CHANGED
@@ -1,8 +1,18 @@
1
- # RuDOLPH-350M (Medium)
 
 
 
 
 
 
 
 
2
 
3
- RuDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP
4
 
5
- <img src="https://raw.githubusercontent.com/sberbank-ai/ru-dolph/master/pics/rudolph-generated.png" height="60" border="2"/>
 
 
6
 
7
 
8
  Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices](https://sberdevices.ru/) teams.
@@ -12,12 +22,25 @@ Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices]
12
  * Num Parameters: `350M`
13
  * Training Data Volume: `156 million text-image pairs`
14
 
15
-
16
  # Model Description
17
 
18
- **Ru**ssian **D**iffusion **O**n **L**anguage **P**icture **H**yper-modality (RuDOLPH) 350M is a fast and light text-image-text transformer (350M GPT-3) designed for a quick and easy fine-tuning setup for the solution of various tasks: from generating images by text description and image classification to visual question answering and more. This model demonstrates the power of Hyper-modality Transformers.
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
- *(!!!) Hyper-modality means generalized multi-modal, e.g., model that consists of two multi-modal parts: text-2-image and image-2-text becomes text and image hyper-modality model*
 
 
21
 
22
  # Sparse Attention Mask
23
 
 
1
+ ---
2
+ tags:
3
+ - RUDOLPH
4
+ - text-image
5
+ - image-text
6
+ - decoder
7
+ datasets:
8
+ - sberquad
9
+ ---
10
 
11
+ # RUDOLPH-350M (Medium)
12
 
13
+ RUDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP
14
+
15
+ <img src="https://raw.githubusercontent.com/sberbank-ai/ru-dolph/master/pics/RUDOLPH.png" height="60" border="2"/>
16
 
17
 
18
  Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices](https://sberdevices.ru/) teams.
 
22
  * Num Parameters: `350M`
23
  * Training Data Volume: `156 million text-image pairs`
24
 
 
25
  # Model Description
26
 
27
+ **RU**ssian **D**ecoder **O**n **L**anguage **P**icture **H**yper-tasking (**RUDOLPH**) **350M** is a fast and light text-image-text transformer (350M GPT-3) designed for a quick and easy fine-tuning for a range of tasks: from generating images by text description and image classification to visual question answering and more. This model demonstrates the power of Hyper-tasking Transformers.
28
+
29
+ *Hyper-tasking means generalized multi-tasking, e.g., the model that can solve almost all tasks within supported modalities (two modalities in case of RUDOLPH: images and Russian texts).*
30
+
31
+ # Details of architecture
32
+
33
+ ### Parameters
34
+
35
+ <img src=https://raw.githubusercontent.com/ai-forever/ru-dolph/master/pics/scheme-rudolph_27B.jpg height="20" border="2"/>
36
+
37
+ The maximum sequence length that this model may be used with depends on the modality and stands for 384 - 576 - 128 for the left text tokens, image tokens, and right text tokens, respectively.
38
+
39
+ RUDOLPH 2.7B is a Transformer-based decoder model with the following parameters:
40
 
41
+ * num\_layers (24) Number of hidden layers in the Transformer decoder.
42
+ * hidden\_size (1024) — Dimensionality of the hidden layers.
43
+ * num\_attention\_heads (16) — Number of attention heads for each attention layer.
44
 
45
  # Sparse Attention Mask
46