jonabur
commited on
Commit
•
4118a0f
1
Parent(s):
564a58e
update note about GAS
Browse files
README.md
CHANGED
@@ -59,7 +59,7 @@ model = transformers.AutoModelForCausalLM.from_pretrained(
|
|
59 |
|
60 |
## Training
|
61 |
|
62 |
-
Poro was trained on the LUMI supercomputer, using 512 AMD MI250X GPUs. Each MI250X GPU has two Graphics Complex Dies (GCDs) for a world size of 1024 during training, using activation checkpointing, a micro batch size of 1, and a 3D parallelism strategy of TP=2, PP=4, DP=128.
|
63 |
|
64 |
Training began in September 2023 using a [custom fork](https://github.com/TurkuNLP/Megatron-DeepSpeed) of the Megatron-Deepspeed framework.
|
65 |
|
@@ -117,4 +117,4 @@ Poro is an advanced language model, primarily optimized for English, Finnish and
|
|
117 |
|
118 |
## License
|
119 |
|
120 |
-
Poro is released under the Apache 2.0 license.
|
|
|
59 |
|
60 |
## Training
|
61 |
|
62 |
+
Poro was trained on the LUMI supercomputer, using 512 AMD MI250X GPUs. Each MI250X GPU has two Graphics Complex Dies (GCDs) for a world size of 1024 during training, using activation checkpointing, a micro batch size of 1, gradient accumulation of 16, and a 3D parallelism strategy of TP=2, PP=4, DP=128.
|
63 |
|
64 |
Training began in September 2023 using a [custom fork](https://github.com/TurkuNLP/Megatron-DeepSpeed) of the Megatron-Deepspeed framework.
|
65 |
|
|
|
117 |
|
118 |
## License
|
119 |
|
120 |
+
Poro is released under the Apache 2.0 license.
|