HuggingFaceM4
/

idefics-80b

Text Generation

image-text-to-text

text-generation-inference

Model card Files Files and versions Community

VictorSanh commited on Aug 2, 2023

Commit

cb68b97

•

1 Parent(s): 1eaac25

oops

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -116,7 +116,7 @@ The model is trained on the following data mixture of openly accessible English
 For multimodal web documents, we feed the model sequences corresponding to the succession of text paragraphs and images. For image-text pairs, we form the training sequences by packing images with their captions. The images are encoded with the vision encoder and vision hidden states are pooled with Transformer Perceiver blocks and then fused into the text sequence through the cross-attention blocks.
-Following (Dehghani et al., 2023)[https://huggingface.co/papers/2302.05442], we apply a layer normalization on the projected queries and keys of both the Perceiver and cross-attention blocks, which improved training stability in our early experiments. We use the [RMSNorm](https://huggingface.co/papers/1910.07467) implementation for trainable Layer Norms.
 The training objective is the standard next token prediction.

 For multimodal web documents, we feed the model sequences corresponding to the succession of text paragraphs and images. For image-text pairs, we form the training sequences by packing images with their captions. The images are encoded with the vision encoder and vision hidden states are pooled with Transformer Perceiver blocks and then fused into the text sequence through the cross-attention blocks.
+Following [Dehghani et al., 2023](https://huggingface.co/papers/2302.05442), we apply a layer normalization on the projected queries and keys of both the Perceiver and cross-attention blocks, which improved training stability in our early experiments. We use the [RMSNorm](https://huggingface.co/papers/1910.07467) implementation for trainable Layer Norms.
 The training objective is the standard next token prediction.