nikitastheo
commited on
Commit
•
aff0e4d
1
Parent(s):
aa68d5f
Update README.md
Browse files
README.md
CHANGED
@@ -9,6 +9,6 @@ This model uses the LTG-BERT architecture.
|
|
9 |
The model was trained on a combination of the BabyLM Dataset, the TinyStories Dataset, and generated data,
|
10 |
in accordance with the rules of the Stric track, and the 100M word budget.
|
11 |
|
12 |
-
The
|
13 |
|
14 |
Hyperparameters used and evaluation scores will follow in a subsequent update.
|
|
|
9 |
The model was trained on a combination of the BabyLM Dataset, the TinyStories Dataset, and generated data,
|
10 |
in accordance with the rules of the Stric track, and the 100M word budget.
|
11 |
|
12 |
+
The model was trained with 128 token sequence length
|
13 |
|
14 |
Hyperparameters used and evaluation scores will follow in a subsequent update.
|