sberbank-ai
commited on
Commit
•
21b968e
1
Parent(s):
568d1ea
Update README.md
Browse files
README.md
CHANGED
@@ -1,8 +1,18 @@
|
|
1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
3 |
-
|
4 |
|
5 |
-
|
|
|
|
|
6 |
|
7 |
|
8 |
Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices](https://sberdevices.ru/) teams.
|
@@ -12,12 +22,25 @@ Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices]
|
|
12 |
* Num Parameters: `350M`
|
13 |
* Training Data Volume: `156 million text-image pairs`
|
14 |
|
15 |
-
|
16 |
# Model Description
|
17 |
|
18 |
-
**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
|
20 |
-
*(
|
|
|
|
|
21 |
|
22 |
# Sparse Attention Mask
|
23 |
|
|
|
1 |
+
---
|
2 |
+
tags:
|
3 |
+
- RUDOLPH
|
4 |
+
- text-image
|
5 |
+
- image-text
|
6 |
+
- decoder
|
7 |
+
datasets:
|
8 |
+
- sberquad
|
9 |
+
---
|
10 |
|
11 |
+
# RUDOLPH-350M (Medium)
|
12 |
|
13 |
+
RUDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP
|
14 |
+
|
15 |
+
<img src="https://raw.githubusercontent.com/sberbank-ai/ru-dolph/master/pics/RUDOLPH.png" height="60" border="2"/>
|
16 |
|
17 |
|
18 |
Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices](https://sberdevices.ru/) teams.
|
|
|
22 |
* Num Parameters: `350M`
|
23 |
* Training Data Volume: `156 million text-image pairs`
|
24 |
|
|
|
25 |
# Model Description
|
26 |
|
27 |
+
**RU**ssian **D**ecoder **O**n **L**anguage **P**icture **H**yper-tasking (**RUDOLPH**) **350M** is a fast and light text-image-text transformer (350M GPT-3) designed for a quick and easy fine-tuning for a range of tasks: from generating images by text description and image classification to visual question answering and more. This model demonstrates the power of Hyper-tasking Transformers.
|
28 |
+
|
29 |
+
*Hyper-tasking means generalized multi-tasking, e.g., the model that can solve almost all tasks within supported modalities (two modalities in case of RUDOLPH: images and Russian texts).*
|
30 |
+
|
31 |
+
# Details of architecture
|
32 |
+
|
33 |
+
### Parameters
|
34 |
+
|
35 |
+
<img src=https://raw.githubusercontent.com/ai-forever/ru-dolph/master/pics/scheme-rudolph_27B.jpg height="20" border="2"/>
|
36 |
+
|
37 |
+
The maximum sequence length that this model may be used with depends on the modality and stands for 384 - 576 - 128 for the left text tokens, image tokens, and right text tokens, respectively.
|
38 |
+
|
39 |
+
RUDOLPH 2.7B is a Transformer-based decoder model with the following parameters:
|
40 |
|
41 |
+
* num\_layers (24) — Number of hidden layers in the Transformer decoder.
|
42 |
+
* hidden\_size (1024) — Dimensionality of the hidden layers.
|
43 |
+
* num\_attention\_heads (16) — Number of attention heads for each attention layer.
|
44 |
|
45 |
# Sparse Attention Mask
|
46 |
|