tokenmonster / README.md
alasdairforsythe's picture
Update README.md
72491b3
|
raw
history blame
2.81 kB
metadata
license: mit

TokenMonster

The documentation and code is available on Github alasdairforsythe/tokenmonster.

Trained models can be downloaded from here:

With capcode

Name Vocab Size Charset Availablity
english-100256-capcode 100256 UTF-8 download
english-65536-capcode 65536 UTF-8 download
english-50256-capcode 50256 UTF-8 download
english-40000-capcode 40000 UTF-8 in-progress
english-32000-capcode 32000 UTF-8 download
english-24000-capcode 24000 UTF-8 in-progress
code-100256-capcode 100256 UTF-8 download
code-65536-capcode 65536 UTF-8 in-progress
code-50256-capcode 50256 UTF-8 in-progress
code-40000-capcode 40000 UTF-8 in-progress
code-32000-capcode 32000 UTF-8 download
code-24000-capcode 24000 UTF-8 in-progress

Without capcode

Name Vocab Size Charset Availablity
english-100256 100256 UTF-8 download
english-65536 65536 UTF-8 in-progress
english-50256 50256 UTF-8 in-progress
english-40000 40000 UTF-8 in-progress
english-32000 32000 UTF-8 in-progress
english-24000 24000 UTF-8 in-progress
code-100256 100256 UTF-8 download
code-65536 65536 UTF-8 in-progress
code-50256 50256 UTF-8 download
code-40000 40000 UTF-8 in-progress
code-32000 32000 UTF-8 download
code-24000 24000 UTF-8 in-progress

in-progress vocabularies will be released 1 per day.