File size: 2,811 Bytes
69c4aa0
 
 
72491b3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---
license: mit
---
## TokenMonster

The documentation and code is available on Github [alasdairforsythe/tokenmonster](https://github.com/alasdairforsythe/tokenmonster).

Trained models can be downloaded from here:

#### With capcode
| Name                    | Vocab Size | Charset | Availablity
|-------------------------|------------|-------|--------------
| english-100256-capcode  | 100256     | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/english-100256-capcode.vocab)
| english-65536-capcode   | 65536      | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/english-65536-capcode.vocab)
| english-50256-capcode   | 50256      | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/english-50256-capcode.vocab)
| english-40000-capcode   | 40000      | UTF-8 | in-progress
| english-32000-capcode   | 32000      | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/english-32000-capcode.vocab)
| english-24000-capcode   | 24000      | UTF-8 | in-progress
| code-100256-capcode     | 100256     | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/code-100256-capcode.vocab)
| code-65536-capcode      | 65536      | UTF-8 | in-progress
| code-50256-capcode      | 50256      | UTF-8 | in-progress
| code-40000-capcode      | 40000      | UTF-8 | in-progress
| code-32000-capcode      | 32000      | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/code-32000-capcode.vocab)
| code-24000-capcode      | 24000      | UTF-8 | in-progress

#### Without capcode
| Name            | Vocab Size | Charset | Availablity
|-----------------|------------|--------|-------------
| english-100256  | 100256     | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/english-100256.vocab)
| english-65536   | 65536      | UTF-8 | in-progress
| english-50256   | 50256      | UTF-8 | in-progress
| english-40000   | 40000      | UTF-8 | in-progress
| english-32000   | 32000      | UTF-8 | in-progress
| english-24000   | 24000      | UTF-8 | in-progress
| code-100256     | 100256     | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/code-100256.vocab)
| code-65536      | 65536      | UTF-8 | in-progress
| code-50256      | 50256      | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/code-50256.vocab)
| code-40000      | 40000      | UTF-8 | in-progress
| code-32000      | 32000      | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/code-32000.vocab)
| code-24000      | 24000      | UTF-8 | in-progress

in-progress vocabularies will be released 1 per day.