sagorsarker commited on
Commit
c28c1ea
1 Parent(s): 44d2a6f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -1
README.md CHANGED
@@ -29,7 +29,16 @@ Datasets comprise Bangla, English, and Codes data. We mixed Bangla data with Eng
29
 
30
  Token-wise distribution will be added soon below.
31
 
32
-
 
 
 
 
 
 
 
 
 
33
 
34
  ## How to Use
35
  The basic use cases to generate text using this model are simple. Follow the below code to generate text using this model.
 
29
 
30
  Token-wise distribution will be added soon below.
31
 
32
+ | Data chunk | Language | Token count |
33
+ |----------------|----------|-------------|
34
+ | Redpajama Arxiv | English | 00 |
35
+ | Redpajama Book | English | 00 |
36
+ | Redpajama Wikipedia | English | 00 |
37
+ | Redpajama Github Code | English | 00 |
38
+ | Redpajama StackExchange | English | 00 |
39
+ | Redpajama Common crawl | English | 00 |
40
+ | Redpajama C4 | English | 00 |
41
+ | Bangla (culturax, books, news, Wikipedia, Banglapedia) | Bangla | 00 |
42
 
43
  ## How to Use
44
  The basic use cases to generate text using this model are simple. Follow the below code to generate text using this model.