DreamGenX commited on
Commit
0ced4dc
1 Parent(s): 6f54dc6

Upload /README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +3 -6
README.md CHANGED
@@ -141,20 +141,17 @@ You can run the models on [dreamgen.com](https://dreamgen.com) for free — you
141
  - **LM Studio**
142
  - [Config](configs/lmstudio/preset.json)
143
  - Just like ChatML, just changed "assistant" to "text" role.
 
144
  - **HuggingFace**
145
  - [Chat template](tokenizer_config.json#L51)
146
  - Just like ChatML, just changed "assistant" to "text" role.
147
 
148
  ## Known Issues
149
 
150
- - **34B tokenization**:
151
- - There seems to be a mismatch between the tokenizer of the base and fine-tuned model. It's unclear whether this also affected training, or whether it's just incorrectly saved tokenizer (you can see `tokenizer.json` was not saved ([bug report](https://github.com/OpenAccess-AI-Collective/axolotl/issues/1322))).
152
- - This affects BOS and EOS (which aren't really used by Yi) and the tokenization of the first input token.
153
- - Overall impact should be minor.
154
  - **34B repetition**:
155
  - The 34B sometimes gets stuck repeating the same word, or synonyms. This seems to be a common problem across various Yi 34B fine-tunes.
156
- - **GGUF** / **Ooba**:
157
- - The tokenization might be messed up. Some users reported that `<|im_start|>` and `<|im_end|>` are tokenized as multiple tokens.
158
 
159
  ## License
160
 
 
141
  - **LM Studio**
142
  - [Config](configs/lmstudio/preset.json)
143
  - Just like ChatML, just changed "assistant" to "text" role.
144
+ - **There's a bug** in LM Studio if you delete a message or click "Continue", [see here for details](https://discord.com/channels/1110598183144399058/1212665261128417280/1212665261128417280).
145
  - **HuggingFace**
146
  - [Chat template](tokenizer_config.json#L51)
147
  - Just like ChatML, just changed "assistant" to "text" role.
148
 
149
  ## Known Issues
150
 
 
 
 
 
151
  - **34B repetition**:
152
  - The 34B sometimes gets stuck repeating the same word, or synonyms. This seems to be a common problem across various Yi 34B fine-tunes.
153
+ - **GGUF**:
154
+ - The tokenization might be messed up. Some users reported that `<|im_start|>` and `<|im_end|>` are tokenized as multiple tokens. Also llama.cpp may not tokenize correctly (the Yi tokenizer is subtly different from the Llama 2 tokenizer).
155
 
156
  ## License
157