Update README.md
#19
by
roboojack
- opened
README.md
CHANGED
@@ -128,7 +128,7 @@ Falcon-7B was trained on 1,500B tokens of [RefinedWeb](https://huggingface.co/da
|
|
128 |
| Conversations | 6% | 85B | Reddit, StackOverflow, HackerNews |
|
129 |
| Code | 3% | 45B | |
|
130 |
| RefinedWeb-French | 3% | 45B | massive web crawl |
|
131 |
-
| Technical | 2% | 30B | arXiv, PubMed,
|
132 |
|
133 |
|
134 |
The data was tokenized with the Falcon-[7B](https://huggingface.co/tiiuae/falcon-7b)/[40B](https://huggingface.co/tiiuae/falcon-40b) tokenizer.
|
|
|
128 |
| Conversations | 6% | 85B | Reddit, StackOverflow, HackerNews |
|
129 |
| Code | 3% | 45B | |
|
130 |
| RefinedWeb-French | 3% | 45B | massive web crawl |
|
131 |
+
| Technical | 2% | 30B | arXiv, PubMed, USPTO, etc. |
|
132 |
|
133 |
|
134 |
The data was tokenized with the Falcon-[7B](https://huggingface.co/tiiuae/falcon-7b)/[40B](https://huggingface.co/tiiuae/falcon-40b) tokenizer.
|