alasdairforsythe commited on
Commit
57672ab
1 Parent(s): 430ce7c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -6
README.md CHANGED
@@ -10,18 +10,38 @@ The prebuilt vocabularies are all available for download [here](https://huggingf
10
  **July 3:** TokenMonster v1.0 has been released. The "420" prebuilt vocabularies are being released as they are completed, at a rate of around 10 per day. Let me know if there's one you want and I can prioritize it.
11
 
12
  Choose a dataset from:
13
- `code` `english` `englishcode` `fiction`
 
 
 
 
14
 
15
  Choose a vocab size from:
16
- `1024` `2048` `4096` `8000` `16000` `24000` `32000` `40000` `50256` `65536` `100256`
 
 
 
 
 
 
 
 
 
 
17
 
18
  Choose an optimization mode from:
19
- `unfiltered` `clean` `balanced` `consistent` `strict`
 
 
 
 
20
 
21
  For a capcode disabled vocabulary add:
22
- `nocapcode`
23
 
24
  And finally add the version number:
25
- `v1`
26
 
27
- Examples: `fiction-24000-consistent-v1` `code-4096-clean-nocapcode-v1`
 
 
 
10
  **July 3:** TokenMonster v1.0 has been released. The "420" prebuilt vocabularies are being released as they are completed, at a rate of around 10 per day. Let me know if there's one you want and I can prioritize it.
11
 
12
  Choose a dataset from:
13
+
14
+ - `code`
15
+ - `english`
16
+ - `englishcode`
17
+ - `fiction`
18
 
19
  Choose a vocab size from:
20
+ - `1024`
21
+ - `2048`
22
+ - `4096`
23
+ - `8000`
24
+ - `16000`
25
+ - `24000`
26
+ - `32000`
27
+ - `40000`
28
+ - `50256`
29
+ - `65536`
30
+ - `100256`
31
 
32
  Choose an optimization mode from:
33
+ - `unfiltered`
34
+ - `clean`
35
+ - `balanced`
36
+ - `consistent`
37
+ - `strict`
38
 
39
  For a capcode disabled vocabulary add:
40
+ - `nocapcode`
41
 
42
  And finally add the version number:
43
+ - `v1`
44
 
45
+ Examples:
46
+ - `fiction-24000-consistent-v1`
47
+ - `code-4096-clean-nocapcode-v1`