tokenmonster / README.md
alasdairforsythe's picture
added english-32000, english-50256
492f305
|
raw
history blame
3.08 kB
---
license: mit
---
## TokenMonster
The documentation and code is available on Github [alasdairforsythe/tokenmonster](https://github.com/alasdairforsythe/tokenmonster).
Trained models can be downloaded from here:
#### With capcode
| Name | Vocab Size | Charset | Availablity
|-------------------------|------------|-------|--------------
| english-100256-capcode | 100256 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/english-100256-capcode.vocab)
| english-65536-capcode | 65536 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/english-65536-capcode.vocab)
| english-50256-capcode | 50256 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/english-50256-capcode.vocab)
| english-40000-capcode | 40000 | UTF-8 | in-progress
| english-32000-capcode | 32000 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/english-32000-capcode.vocab)
| english-24000-capcode | 24000 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/english-24000-capcode.vocab)
| code-100256-capcode | 100256 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/code-100256-capcode.vocab)
| code-65536-capcode | 65536 | UTF-8 | in-progress
| code-50256-capcode | 50256 | UTF-8 | in-progress
| code-40000-capcode | 40000 | UTF-8 | in-progress
| code-32000-capcode | 32000 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/code-32000-capcode.vocab)
| code-24000-capcode | 24000 | UTF-8 | in-progress
#### Without capcode
| Name | Vocab Size | Charset | Availablity
|-----------------|------------|--------|-------------
| english-100256 | 100256 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/english-100256.vocab)
| english-65536 | 65536 | UTF-8 | in-progress
| english-50256 | 50256 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/english-50256.vocab)
| english-40000 | 40000 | UTF-8 | in-progress
| english-32000 | 32000 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/english-32000.vocab)
| english-24000 | 24000 | UTF-8 | in-progress
| code-100256 | 100256 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/code-100256.vocab)
| code-65536 | 65536 | UTF-8 | in-progress
| code-50256 | 50256 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/code-50256.vocab)
| code-40000 | 40000 | UTF-8 | in-progress
| code-32000 | 32000 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/code-32000.vocab)
| code-24000 | 24000 | UTF-8 | in-progress
in-progress vocabularies will be released 1 per day.