LumiOpen
/

Poro-34B

@@ -59,7 +59,7 @@ model = transformers.AutoModelForCausalLM.from_pretrained(
 ## Training
-Poro was trained on the LUMI supercomputer, using 512 AMD MI250X GPUs. Each MI250X GPU has two Graphics Complex Dies (GCDs) for a world size of 1024 during training, using activation checkpointing, a micro batch size of 1, and a 3D parallelism strategy of TP=2, PP=4, DP=128.
 Training began in September 2023 using a [custom fork](https://github.com/TurkuNLP/Megatron-DeepSpeed) of the Megatron-Deepspeed framework.
@@ -117,4 +117,4 @@ Poro is an advanced language model, primarily optimized for English, Finnish and
 ## License
-Poro is released under the Apache 2.0 license.

 ## Training
+Poro was trained on the LUMI supercomputer, using 512 AMD MI250X GPUs. Each MI250X GPU has two Graphics Complex Dies (GCDs) for a world size of 1024 during training, using activation checkpointing, a micro batch size of 1, gradient accumulation of 16, and a 3D parallelism strategy of TP=2, PP=4, DP=128.
 Training began in September 2023 using a [custom fork](https://github.com/TurkuNLP/Megatron-DeepSpeed) of the Megatron-Deepspeed framework.
 ## License
+Poro is released under the Apache 2.0 license.