sagorsarker
commited on
Commit
•
44d2a6f
1
Parent(s):
e83a1c6
Update README.md
Browse files
README.md
CHANGED
@@ -25,6 +25,9 @@ Notable training configs:
|
|
25 |
|
26 |
|
27 |
## Datasets
|
|
|
|
|
|
|
28 |
|
29 |
|
30 |
|
|
|
25 |
|
26 |
|
27 |
## Datasets
|
28 |
+
Datasets comprise Bangla, English, and Codes data. We mixed Bangla data with English Redpajama (C4, Github, StackExchange, Book, Arxiv, Wikipedia) data.
|
29 |
+
|
30 |
+
Token-wise distribution will be added soon below.
|
31 |
|
32 |
|
33 |
|