alasdairforsythe commited on
Commit
72491b3
1 Parent(s): 55b0fd9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -0
README.md CHANGED
@@ -1,3 +1,42 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+ ## TokenMonster
5
+
6
+ The documentation and code is available on Github [alasdairforsythe/tokenmonster](https://github.com/alasdairforsythe/tokenmonster).
7
+
8
+ Trained models can be downloaded from here:
9
+
10
+ #### With capcode
11
+ | Name | Vocab Size | Charset | Availablity
12
+ |-------------------------|------------|-------|--------------
13
+ | english-100256-capcode | 100256 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/english-100256-capcode.vocab)
14
+ | english-65536-capcode | 65536 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/english-65536-capcode.vocab)
15
+ | english-50256-capcode | 50256 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/english-50256-capcode.vocab)
16
+ | english-40000-capcode | 40000 | UTF-8 | in-progress
17
+ | english-32000-capcode | 32000 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/english-32000-capcode.vocab)
18
+ | english-24000-capcode | 24000 | UTF-8 | in-progress
19
+ | code-100256-capcode | 100256 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/code-100256-capcode.vocab)
20
+ | code-65536-capcode | 65536 | UTF-8 | in-progress
21
+ | code-50256-capcode | 50256 | UTF-8 | in-progress
22
+ | code-40000-capcode | 40000 | UTF-8 | in-progress
23
+ | code-32000-capcode | 32000 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/code-32000-capcode.vocab)
24
+ | code-24000-capcode | 24000 | UTF-8 | in-progress
25
+
26
+ #### Without capcode
27
+ | Name | Vocab Size | Charset | Availablity
28
+ |-----------------|------------|--------|-------------
29
+ | english-100256 | 100256 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/english-100256.vocab)
30
+ | english-65536 | 65536 | UTF-8 | in-progress
31
+ | english-50256 | 50256 | UTF-8 | in-progress
32
+ | english-40000 | 40000 | UTF-8 | in-progress
33
+ | english-32000 | 32000 | UTF-8 | in-progress
34
+ | english-24000 | 24000 | UTF-8 | in-progress
35
+ | code-100256 | 100256 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/code-100256.vocab)
36
+ | code-65536 | 65536 | UTF-8 | in-progress
37
+ | code-50256 | 50256 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/code-50256.vocab)
38
+ | code-40000 | 40000 | UTF-8 | in-progress
39
+ | code-32000 | 32000 | UTF-8 | [download](https://huggingface.co/alasdairforsythe/tokenmonster/resolve/main/code-32000.vocab)
40
+ | code-24000 | 24000 | UTF-8 | in-progress
41
+
42
+ in-progress vocabularies will be released 1 per day.