sagorsarker
commited on
Commit
•
c28c1ea
1
Parent(s):
44d2a6f
Update README.md
Browse files
README.md
CHANGED
@@ -29,7 +29,16 @@ Datasets comprise Bangla, English, and Codes data. We mixed Bangla data with Eng
|
|
29 |
|
30 |
Token-wise distribution will be added soon below.
|
31 |
|
32 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
|
34 |
## How to Use
|
35 |
The basic use cases to generate text using this model are simple. Follow the below code to generate text using this model.
|
|
|
29 |
|
30 |
Token-wise distribution will be added soon below.
|
31 |
|
32 |
+
| Data chunk | Language | Token count |
|
33 |
+
|----------------|----------|-------------|
|
34 |
+
| Redpajama Arxiv | English | 00 |
|
35 |
+
| Redpajama Book | English | 00 |
|
36 |
+
| Redpajama Wikipedia | English | 00 |
|
37 |
+
| Redpajama Github Code | English | 00 |
|
38 |
+
| Redpajama StackExchange | English | 00 |
|
39 |
+
| Redpajama Common crawl | English | 00 |
|
40 |
+
| Redpajama C4 | English | 00 |
|
41 |
+
| Bangla (culturax, books, news, Wikipedia, Banglapedia) | Bangla | 00 |
|
42 |
|
43 |
## How to Use
|
44 |
The basic use cases to generate text using this model are simple. Follow the below code to generate text using this model.
|