ai-forever
/

RUDOLPH-350M

Model card Files Files and versions Community

sberbank-ai commited on Oct 5, 2022

Commit

21b968e

•

1 Parent(s): 568d1ea

Update README.md

Files changed (1) hide show

README.md +29 -6

README.md CHANGED Viewed

@@ -1,8 +1,18 @@
-# RuDOLPH-350M (Medium)
-RuDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP
-<img src="https://raw.githubusercontent.com/sberbank-ai/ru-dolph/master/pics/rudolph-generated.png" height="60" border="2"/>
 Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices](https://sberdevices.ru/) teams.
@@ -12,12 +22,25 @@ Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices]
 * Num Parameters: `350M`
 * Training Data Volume: `156 million text-image pairs`
 # Model Description
-**Ru**ssian **D**iffusion **O**n **L**anguage **P**icture **H**yper-modality (RuDOLPH) 350M is a fast and light text-image-text transformer (350M GPT-3) designed for a quick and easy fine-tuning setup for the solution of various tasks: from generating images by text description and image classification to visual question answering and more. This model demonstrates the power of Hyper-modality Transformers.
-*(!!!) Hyper-modality means generalized multi-modal, e.g., model that consists of two multi-modal parts: text-2-image and image-2-text becomes text and image hyper-modality model*
 # Sparse Attention Mask

+---
+tags:
+- RUDOLPH
+- text-image
+- image-text
+- decoder
+datasets:
+- sberquad
+---
+# RUDOLPH-350M (Medium)
+RUDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP
+<img src="https://raw.githubusercontent.com/sberbank-ai/ru-dolph/master/pics/RUDOLPH.png" height="60" border="2"/>
 Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices](https://sberdevices.ru/) teams.
 * Num Parameters: `350M`
 * Training Data Volume: `156 million text-image pairs`
 # Model Description
+**RU**ssian **D**ecoder **O**n **L**anguage **P**icture **H**yper-tasking (**RUDOLPH**) **350M** is a fast and light text-image-text transformer (350M GPT-3) designed for a quick and easy fine-tuning for a range of tasks: from generating images by text description and image classification to visual question answering and more. This model demonstrates the power of Hyper-tasking Transformers.
+*Hyper-tasking means generalized multi-tasking, e.g., the model that can solve almost all tasks within supported modalities (two modalities in case of RUDOLPH: images and Russian texts).*
+# Details of architecture
+### Parameters
+<img src=https://raw.githubusercontent.com/ai-forever/ru-dolph/master/pics/scheme-rudolph_27B.jpg height="20" border="2"/>
+The maximum sequence length that this model may be used with depends on the modality and stands for 384 - 576 - 128 for the left text tokens, image tokens, and right text tokens, respectively.
+RUDOLPH 2.7B is a Transformer-based decoder model with the following parameters:
+* num\_layers (24) — Number of hidden layers in the Transformer decoder.
+* hidden\_size (1024) — Dimensionality of the hidden layers.
+* num\_attention\_heads (16) — Number of attention heads for each attention layer.
 # Sparse Attention Mask