tokenmonster / README.md
alasdairforsythe's picture
Update README.md
430ce7c
|
raw
history blame
956 Bytes
metadata
license: mit

TokenMonster

The documentation and code is available on Github alasdairforsythe/tokenmonster.

The prebuilt vocabularies are all available for download here.

July 3: TokenMonster v1.0 has been released. The "420" prebuilt vocabularies are being released as they are completed, at a rate of around 10 per day. Let me know if there's one you want and I can prioritize it.

Choose a dataset from: code english englishcode fiction

Choose a vocab size from: 1024 2048 4096 8000 16000 24000 32000 40000 50256 65536 100256

Choose an optimization mode from: unfiltered clean balanced consistent strict

For a capcode disabled vocabulary add: nocapcode

And finally add the version number: v1

Examples: fiction-24000-consistent-v1 code-4096-clean-nocapcode-v1