Transformers
Safetensors
openlm
Inference Endpoints
vaishaal commited on
Commit
b514b5e
1 Parent(s): 5967e4e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -15,7 +15,7 @@ DCLM-Baseline-7B is a 7 billion parameter language model trained on the DCLM-Bas
15
 
16
  | Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length |
17
  |------|-----------------|--------|-------------|-----------------|----------------|
18
- | 7B | 2.6T | 32 | 4096 | 32 | 2048 |
19
 
20
 
21
  ### Model Description
@@ -44,7 +44,7 @@ The model was trained using the following setup:
44
  - **Learning Rate:** 2e-3 (peak)
45
  - **Weight Decay:** 0.05
46
  - **Batch Size:** 2048 sequences
47
- - **Sequence Length:** 2048 tokens
48
  - **Total Training Tokens:** 2.6T
49
  - **Hardware:** Trained on H100 GPUs
50
 
 
15
 
16
  | Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length |
17
  |------|-----------------|--------|-------------|-----------------|----------------|
18
+ | 7B | 2.6T | 32 | 4096 | 32 | 8192 |
19
 
20
 
21
  ### Model Description
 
44
  - **Learning Rate:** 2e-3 (peak)
45
  - **Weight Decay:** 0.05
46
  - **Batch Size:** 2048 sequences
47
+ - **Sequence Length:** 8192 tokens
48
  - **Total Training Tokens:** 2.6T
49
  - **Hardware:** Trained on H100 GPUs
50