alasdairforsythe
commited on
Commit
•
72491b3
1
Parent(s):
55b0fd9
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,42 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
4 |
+
## TokenMonster
|
5 |
+
|
6 |
+
The documentation and code is available on Github [alasdairforsythe/tokenmonster](https://github.com/alasdairforsythe/tokenmonster).
|
7 |
+
|
8 |
+
Trained models can be downloaded from here:
|
9 |
+
|
10 |
+
#### With capcode
|
11 |
+
| Name | Vocab Size | Charset | Availablity
|
12 |
+
|-------------------------|------------|-------|--------------
|
13 |
+
| english-100256-capcode | 100256 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/english-100256-capcode.vocab)
|
14 |
+
| english-65536-capcode | 65536 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/english-65536-capcode.vocab)
|
15 |
+
| english-50256-capcode | 50256 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/english-50256-capcode.vocab)
|
16 |
+
| english-40000-capcode | 40000 | UTF-8 | in-progress
|
17 |
+
| english-32000-capcode | 32000 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/english-32000-capcode.vocab)
|
18 |
+
| english-24000-capcode | 24000 | UTF-8 | in-progress
|
19 |
+
| code-100256-capcode | 100256 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/code-100256-capcode.vocab)
|
20 |
+
| code-65536-capcode | 65536 | UTF-8 | in-progress
|
21 |
+
| code-50256-capcode | 50256 | UTF-8 | in-progress
|
22 |
+
| code-40000-capcode | 40000 | UTF-8 | in-progress
|
23 |
+
| code-32000-capcode | 32000 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/code-32000-capcode.vocab)
|
24 |
+
| code-24000-capcode | 24000 | UTF-8 | in-progress
|
25 |
+
|
26 |
+
#### Without capcode
|
27 |
+
| Name | Vocab Size | Charset | Availablity
|
28 |
+
|-----------------|------------|--------|-------------
|
29 |
+
| english-100256 | 100256 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/english-100256.vocab)
|
30 |
+
| english-65536 | 65536 | UTF-8 | in-progress
|
31 |
+
| english-50256 | 50256 | UTF-8 | in-progress
|
32 |
+
| english-40000 | 40000 | UTF-8 | in-progress
|
33 |
+
| english-32000 | 32000 | UTF-8 | in-progress
|
34 |
+
| english-24000 | 24000 | UTF-8 | in-progress
|
35 |
+
| code-100256 | 100256 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/code-100256.vocab)
|
36 |
+
| code-65536 | 65536 | UTF-8 | in-progress
|
37 |
+
| code-50256 | 50256 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/code-50256.vocab)
|
38 |
+
| code-40000 | 40000 | UTF-8 | in-progress
|
39 |
+
| code-32000 | 32000 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/code-32000.vocab)
|
40 |
+
| code-24000 | 24000 | UTF-8 | in-progress
|
41 |
+
|
42 |
+
in-progress vocabularies will be released 1 per day.
|