HuggingFaceM4
/

idefics-80b

stas commited on Aug 22, 2023

Commit

eddaa07

•

1 Parent(s): 2ccbe9d

Update README.md (#17)

Files changed (1) hide show

README.md CHANGED Viewed

@@ -305,11 +305,15 @@ Similarly to the base IDEFICS models, we performed checkpoint selection to stop
 ## Hardware
-The IDEFICS models were trained on an AWS SageMaker cluster using at the maximum 64 nodes of 8x80GB A100 GPUs (512 GPUs total). The cluster uses the current EFA network. IDEFICS-80b was trained for approximately 672 node hours. IDEFICS-80b-instruct was trained for approximately 3 days on 48 nodes.
 ## Software
-The training software is built on top of HuggingFace Transformers + Accelerate, and DeepSpeed ZeRO-3 for training, and [WebDataset](https://github.com/webdataset/webdataset) for data loading.
 # Bias, Risks, and Limitations

 ## Hardware
+The IDEFICS models were trained on an AWS SageMaker cluster with 8x80GB A100 GPUs nodes and EFA network.
+- IDEFICS-80B took ~28 days of training on 64 nodes (512 GPUs).
+- IDEFICS-80b-instruct finetuned the base model for ~3 days on 48 nodes (384 GPUs).
 ## Software
+The training software is built on top of HuggingFace Transformers + Accelerate, and [DeepSpeed ZeRO-3](https://github.com/microsoft/DeepSpeed) for training, and [WebDataset](https://github.com/webdataset/webdataset) for data loading.
 # Bias, Risks, and Limitations