wip
Browse files
README.md
CHANGED
@@ -15,7 +15,6 @@ Some cool model...
|
|
15 |
|
16 |
- [Model Card for m4-80b](#model-card-for--model_id-)
|
17 |
- [Table of Contents](#table-of-contents)
|
18 |
-
- [Table of Contents](#table-of-contents-1)
|
19 |
- [Model Details](#model-details)
|
20 |
- [Model Description](#model-description)
|
21 |
- [Uses](#uses)
|
@@ -57,15 +56,14 @@ Some cool model...
|
|
57 |
<!-- Provide a longer summary of what this model is/does. -->
|
58 |
Some cool model...
|
59 |
|
60 |
-
- **Developed by:**
|
61 |
-
- **
|
62 |
-
- **Model type:** Language model
|
63 |
- **Language(s) (NLP):** en
|
64 |
- **License:** apache-2.0
|
65 |
-
- **Parent Model:**
|
66 |
- **Resources for more information:** More information needed
|
67 |
- [GitHub Repo](https://github.com/huggingface/m4/)
|
68 |
-
-
|
69 |
|
70 |
# Uses
|
71 |
|
@@ -172,10 +170,9 @@ More information needed
|
|
172 |
|
173 |
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
174 |
|
175 |
-
- **Hardware Type:**
|
176 |
-
- **Hours used:**
|
177 |
-
- **Cloud Provider:**
|
178 |
-
- **Compute Region:** More information needed
|
179 |
- **Carbon Emitted:** unknown
|
180 |
|
181 |
# Technical Specifications [optional]
|
@@ -190,11 +187,15 @@ More information needed
|
|
190 |
|
191 |
### Hardware
|
192 |
|
193 |
-
|
|
|
|
|
|
|
194 |
|
195 |
### Software
|
196 |
|
197 |
-
|
|
|
198 |
|
199 |
# Citation
|
200 |
|
|
|
15 |
|
16 |
- [Model Card for m4-80b](#model-card-for--model_id-)
|
17 |
- [Table of Contents](#table-of-contents)
|
|
|
18 |
- [Model Details](#model-details)
|
19 |
- [Model Description](#model-description)
|
20 |
- [Uses](#uses)
|
|
|
56 |
<!-- Provide a longer summary of what this model is/does. -->
|
57 |
Some cool model...
|
58 |
|
59 |
+
- **Developed by:** HuggingFace
|
60 |
+
- **Model type:** Multi-modal model (text+image)
|
|
|
61 |
- **Language(s) (NLP):** en
|
62 |
- **License:** apache-2.0
|
63 |
+
- **Parent Model:** [laion/CLIP-ViT-H-14-laion2B-s32B-b79K](https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K) and [huggingface/llama-65b](https://huggingface.co/huggingface/llama-65b)
|
64 |
- **Resources for more information:** More information needed
|
65 |
- [GitHub Repo](https://github.com/huggingface/m4/)
|
66 |
+
- Associated Paper: [Flamingo: a Visual Language Model for Few-Shot Learning](https://arxiv.org/abs/2204.14198)
|
67 |
|
68 |
# Uses
|
69 |
|
|
|
170 |
|
171 |
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
172 |
|
173 |
+
- **Hardware Type:** 64 nodes of 8x 80GB A100 gpus, EFA network
|
174 |
+
- **Hours used:** ~672 node hours
|
175 |
+
- **Cloud Provider:** AWS Sagemaker
|
|
|
176 |
- **Carbon Emitted:** unknown
|
177 |
|
178 |
# Technical Specifications [optional]
|
|
|
187 |
|
188 |
### Hardware
|
189 |
|
190 |
+
The training was performed on AWS SageMaker cluster with 64 nodes of 8x80GB A100 GPUs (512 GPUs total). The cluster uses the current EFA network which provides about 340GBps throughput.
|
191 |
+
|
192 |
+
As the network is quite slow for the needs of DeepSpeed ZeRO-3 we were only able to clock ~90 TFLOPs.
|
193 |
+
|
194 |
|
195 |
### Software
|
196 |
|
197 |
+
The training software is built on top of HuggingFace Transformers + Accelerate, and DeepSpeed ZeRO-3. Plus [WebDataset](https://github.com/webdataset/webdataset) for data loading.
|
198 |
+
|
199 |
|
200 |
# Citation
|
201 |
|