ai-forever
/

RUDOLPH-350M

Model card Files Files and versions Community

sberbank-ai commited on Oct 5, 2022

Commit

82bc2d2

•

1 Parent(s): 21b968e

Update README.md

Files changed (1) hide show

README.md +8 -7

README.md CHANGED Viewed

@@ -16,10 +16,11 @@ RUDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP
 Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices](https://sberdevices.ru/) teams.
-* Task: `text2image generation`; `self reranking`; `text ranking`; `image ranking`; `image2text generation`; `zero-shot image classification`, `text2text generation`;
-* Language: `Russian`
-* Type: `encoder-decoder`
-* Num Parameters: `350M`
 * Training Data Volume: `156 million text-image pairs`
 # Model Description
@@ -32,11 +33,11 @@ Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices]
 ### Parameters
-<img src=https://raw.githubusercontent.com/ai-forever/ru-dolph/master/pics/scheme-rudolph_27B.jpg height="20" border="2"/>
-The maximum sequence length that this model may be used with depends on the modality and stands for 384 - 576 - 128 for the left text tokens, image tokens, and right text tokens, respectively.
-RUDOLPH 2.7B is a Transformer-based decoder model with the following parameters:
 * num\_layers (24) — Number of hidden layers in the Transformer decoder.
 * hidden\_size (1024) — Dimensionality of the hidden layers.

 Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices](https://sberdevices.ru/) teams.
+* Tasks: ` text2image generation, self reranking, text ranking, image ranking, image2text generation, zero-shot image classification, text2text generation`
+* Language: ` Russian`
+* Type: ` decoder`
+* Num Parameters: ` 350M`
 * Training Data Volume: `156 million text-image pairs`
 # Model Description
 ### Parameters
+<img src=https://raw.githubusercontent.com/ai-forever/ru-dolph/master/pics/attention_masks.png height="20" border="2"/>
+The maximum sequence length that this model may be used with depends on the modality and stands for 64 - 256 - 64 for the left text tokens, image tokens, and right text tokens, respectively.
+RUDOLPH 350M is a Transformer-based decoder model with the following parameters:
 * num\_layers (24) — Number of hidden layers in the Transformer decoder.
 * hidden\_size (1024) — Dimensionality of the hidden layers.