Update README.md
Browse files
README.md
CHANGED
@@ -45,7 +45,7 @@ The model was trained using the following setup:
|
|
45 |
- **Weight Decay:** 0.05
|
46 |
- **Batch Size:** 2048 sequences
|
47 |
- **Sequence Length:** 2048 tokens
|
48 |
-
- **Total Training Tokens:** 2.
|
49 |
- **Hardware:** Trained on H100 GPUs
|
50 |
|
51 |
For more detailed training information, please refer to Section 3.4 and Appendix F of the DCLM paper.
|
|
|
45 |
- **Weight Decay:** 0.05
|
46 |
- **Batch Size:** 2048 sequences
|
47 |
- **Sequence Length:** 2048 tokens
|
48 |
+
- **Total Training Tokens:** 2.6T
|
49 |
- **Hardware:** Trained on H100 GPUs
|
50 |
|
51 |
For more detailed training information, please refer to Section 3.4 and Appendix F of the DCLM paper.
|