sberbank-ai
commited on
Commit
β’
4d53b21
1
Parent(s):
6b9d45c
Update README.md
Browse files
README.md
CHANGED
@@ -41,14 +41,14 @@ The model was prepared as a baseline for FusionBrain Challenge 2.0 (as a part of
|
|
41 |
|
42 |
### Parameters
|
43 |
|
44 |
-
<img src=https://github.com/ai-forever/ru-dolph/blob/master/pics/scheme-rudolph_27B.jpg
|
45 |
|
46 |
The maximum sequence length that this model may be used with depends on the modality and stands for 384 - 576 - 128 for the left text tokens, image tokens, and right text tokens, respectively.
|
47 |
|
48 |
RUDOLPH 2.7B is a Transformer-based decoder model with the following parameters:
|
49 |
|
50 |
-
* num
|
51 |
-
* hidden
|
52 |
* num\_attention\_heads (32) β Number of attention heads for each attention layer.
|
53 |
|
54 |
### Sparse Attention Mask
|
|
|
41 |
|
42 |
### Parameters
|
43 |
|
44 |
+
<img src=https://github.com/ai-forever/ru-dolph/blob/master/pics/scheme-rudolph_27B.jpg height="60" border="2"/>
|
45 |
|
46 |
The maximum sequence length that this model may be used with depends on the modality and stands for 384 - 576 - 128 for the left text tokens, image tokens, and right text tokens, respectively.
|
47 |
|
48 |
RUDOLPH 2.7B is a Transformer-based decoder model with the following parameters:
|
49 |
|
50 |
+
* num\_layers (32) β Number of hidden layers in the Transformer decoder.
|
51 |
+
* hidden\_size (2560) β Dimensionality of the hidden layers.
|
52 |
* num\_attention\_heads (32) β Number of attention heads for each attention layer.
|
53 |
|
54 |
### Sparse Attention Mask
|