apple
/

DCLM-7B-8k

vaishaal commited on Jul 16

Commit

0dfac65

•

1 Parent(s): 1d8bc64

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -45,7 +45,7 @@ The model was trained using the following setup:
 - **Weight Decay:** 0.05
 - **Batch Size:** 2048 sequences
 - **Sequence Length:** 2048 tokens
-- **Total Training Tokens:** 2.5T
 - **Hardware:** Trained on H100 GPUs
 For more detailed training information, please refer to Section 3.4 and Appendix F of the DCLM paper.

 - **Weight Decay:** 0.05
 - **Batch Size:** 2048 sequences
 - **Sequence Length:** 2048 tokens
+- **Total Training Tokens:** 2.6T
 - **Hardware:** Trained on H100 GPUs
 For more detailed training information, please refer to Section 3.4 and Appendix F of the DCLM paper.